Commits · d57a3a151660902091491ac2633134e1be92557f · Kirill Smelkov / linux

08 Nov, 2022 23 commits

rxrpc: Save last ACK's SACK table rather than marking txbufs · d57a3a15

David Howells authored May 07, 2022

Improve the tracking of which packets need to be transmitted by saving the
last ACK packet that we receive that has a populated soft-ACK table rather
than marking packets.  Then we can step through the soft-ACK table and look
at the packets we've transmitted beyond that to determine which packets we
might want to retransmit.

We also look at the highest serial number that has been acked to try and
guess which packets we've transmitted the peer is likely to have seen.  If
necessary, we send a ping to retrieve that number.

One downside that might be a problem is that we can't then compare the
previous acked/unacked state so easily in rxrpc_input_soft_acks() - which
is a potential problem for the slow-start algorithm.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

d57a3a15

rxrpc: Remove call->lock · 4e76bd40

David Howells authored May 06, 2022

call->lock is no longer necessary, so remove it.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

4e76bd40

rxrpc: Don't use a ring buffer for call Tx queue · a4ea4c47

David Howells authored Mar 31, 2022

Change the way the Tx queueing works to make the following ends easier to
achieve:

 (1) The filling of packets, the encryption of packets and the transmission
     of packets can be handled in parallel by separate threads, rather than
     rxrpc_sendmsg() allocating, filling, encrypting and transmitting each
     packet before moving onto the next one.

 (2) Get rid of the fixed-size ring which sets a hard limit on the number
     of packets that can be retained in the ring.  This allows the number
     of packets to increase without having to allocate a very large ring or
     having variable-sized rings.

     [Note: the downside of this is that it's then less efficient to locate
     a packet for retransmission as we then have to step through a list and
     examine each buffer in the list.]

 (3) Allow the filler/encrypter to run ahead of the transmission window.

 (4) Make it easier to do zero copy UDP from the packet buffers.

 (5) Make it easier to do zero copy from userspace to the packet buffers -
     and thence to UDP (only if for unauthenticated connections).

To that end, the following changes are made:

 (1) Use the new rxrpc_txbuf struct instead of sk_buff for keeping packets
     to be transmitted in.  This allows them to be placed on multiple
     queues simultaneously.  An sk_buff isn't really necessary as it's
     never passed on to lower-level networking code.

 (2) Keep the transmissable packets in a linked list on the call struct
     rather than in a ring.  As a consequence, the annotation buffer isn't
     used either; rather a flag is set on the packet to indicate ackedness.

 (3) Use the RXRPC_CALL_TX_LAST flag to indicate that the last packet to be
     transmitted has been queued.  Add RXRPC_CALL_TX_ALL_ACKED to indicate
     that all packets up to and including the last got hard acked.

 (4) Wire headers are now stored in the txbuf rather than being concocted
     on the stack and they're stored immediately before the data, thereby
     allowing zerocopy of a single span.

 (5) Don't bother with instant-resend on transmission failure; rather,
     leave it for a timer or an ACK packet to trigger.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

a4ea4c47

rxrpc: Get rid of the Rx ring · 5d7edbc9

David Howells authored Aug 27, 2022

Get rid of the Rx ring and replace it with a pair of queues instead.  One
queue gets the packets that are in-sequence and are ready for processing by
recvmsg(); the other queue gets the out-of-sequence packets for addition to
the first queue as the holes get filled.

The annotation ring is removed and replaced with a SACK table.  The SACK
table has the bits set that correspond exactly to the sequence number of
the packet being acked.  The SACK ring is copied when an ACK packet is
being assembled and rotated so that the first ACK is in byte 0.

Flow control handling is altered so that packets that are moved to the
in-sequence queue are hard-ACK'd even before they're consumed - and then
the Rx window size in the ACK packet (rsize) is shrunk down to compensate
(even going to 0 if the window is full).
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

5d7edbc9

rxrpc: Clone received jumbo subpackets and queue separately · d4d02d8b

David Howells authored Oct 07, 2022

Split up received jumbo packets into separate skbuffs by cloning the
original skbuff for each subpacket and setting the offset and length of the
data in that subpacket in the skbuff's private data.  The subpackets are
then placed on the recvmsg queue separately.  The security class then gets
to revise the offset and length to remove its metadata.

If we fail to clone a packet, we just drop it and let the peer resend it.
The original packet gets used for the final subpacket.

This should make it easier to handle parallel decryption of the subpackets.
It also simplifies the handling of lost or misordered packets in the
queuing/buffering loop as the possibility of overlapping jumbo packets no
longer needs to be considered.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

d4d02d8b

rxrpc: Split the rxrpc_recvmsg tracepoint · faf92e8d

David Howells authored Oct 07, 2022

Split the rxrpc_recvmsg tracepoint so that the tracepoints that are about
data packet processing (and which have extra pieces of information) are
separate from the tracepoint that shows the general flow of recvmsg().
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

faf92e8d

rxrpc: Clean up ACK handling · 530403d9

David Howells authored Jan 30, 2020

Clean up the rxrpc_propose_ACK() function.  If deferred PING ACK proposal
is split out, it's only really needed for deferred DELAY ACKs.  All other
ACKs, bar terminal IDLE ACK are sent immediately.  The deferred IDLE ACK
submission can be handled by conversion of a DELAY ACK into an IDLE ACK if
there's nothing to be SACK'd.

Also, because there's a delay between an ACK being generated and being
transmitted, it's possible that other ACKs of the same type will be
generated during that interval.  Apart from the ACK time and the serial
number responded to, most of the ACK body, including window and SACK
parameters, are not filled out till the point of transmission - so we can
avoid generating a new ACK if there's one pending that will cover the SACK
data we need to convey.

Therefore, don't propose a new DELAY or IDLE ACK for a call if there's one
already pending.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

530403d9

rxrpc: Allocate ACK records at proposal and queue for transmission · 72f0c6fb

David Howells authored Jan 30, 2020

Allocate rxrpc_txbuf records for ACKs and put onto a queue for the
transmitter thread to dispatch.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

72f0c6fb

rxrpc: Define rxrpc_txbuf struct to carry data to be transmitted · 02a19356

David Howells authored Apr 05, 2022

Define a struct, rxrpc_txbuf, to carry data to be transmitted instead of a
socket buffer so that it can be placed onto multiple queues at once.  This
also allows the data buffer to be in the same allocation as the internal
data.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

02a19356

rxrpc: Remove call->tx_phase · a11e6ff9

David Howells authored Oct 07, 2022

Remove call->tx_phase as it's only ever set.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

a11e6ff9

rxrpc: Remove the flags from the rxrpc_skb tracepoint · 27f699cc

David Howells authored Oct 07, 2022

Remove the flags from the rxrpc_skb tracepoint as we're no longer going to
be using this for the transmission buffers and so marking which are
transmission buffers isn't going to be necessary.

Note that this also remove the rxrpc skb flag that indicates if this is a
transmission buffer and so the count is not updated for the moment.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

27f699cc

rxrpc: Remove unnecessary header inclusions · 23b237f3

David Howells authored Jan 23, 2020

Remove a bunch of unnecessary header inclusions.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

23b237f3

rxrpc: Call udp_sendmsg() directly · ed472b0c

David Howells authored Mar 22, 2022

Call udp_sendmsg() and udpv6_sendmsg() directly rather than calling
kernel_sendmsg() as the latter assumes we want a kvec-class iterator.
However, zerocopy explicitly doesn't work with such an iterator.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

ed472b0c

rxrpc: Use the core ICMP/ICMP6 parsers · b6c66c43

David Howells authored Oct 12, 2022

Make rxrpc_encap_rcv_err() pass the ICMP/ICMP6 skbuff to ip_icmp_error() or
ipv6_icmp_error() as appropriate to do the parsing rather than trying to do
it in rxrpc.

This pushes an error report onto the UDP socket's error queue and calls
->sk_error_report() from which point rxrpc can pick it up.

It would be preferable to steal the packet directly from ip*_icmp_error()
rather than letting it get queued, but this is probably good enough.

Also note that __udp4_lib_err() calls sk_error_report() twice in some
cases.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

b6c66c43

net: Change the udp encap_err_rcv to allow use of {ip,ipv6}_icmp_error() · 42fb06b3

David Howells authored Oct 12, 2022

Change the udp encap_err_rcv signature to match ip_icmp_error() and
ipv6_icmp_error() so that those can be used from the called function and
export them.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org

42fb06b3

rxrpc: Fix ack.bufferSize to be 0 when generating an ack · 8889a711

David Howells authored Sep 07, 2022

ack.bufferSize should be set to 0 when generating an ack.

Fixes: 8d94aa38 ("rxrpc: Calls shouldn't hold socket refs")
Reported-by: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

8889a711

rxrpc: Record stats for why the REQUEST-ACK flag is being set · f7fa5242

David Howells authored Aug 18, 2022

Record stats for why the REQUEST-ACK flag is being set.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

f7fa5242

rxrpc: Record statistics about ACK types · f2a676d1

David Howells authored May 11, 2022

Record statistics about the different types of ACKs that have been
transmitted and received and the number of ACKs that have been filled out
and transmitted or that have been skipped.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

f2a676d1

rxrpc: Add stats procfile and DATA packet stats · b0154246

David Howells authored May 11, 2022

Add a procfile, /proc/net/rxrpc/stats, to display some statistics about
what rxrpc has been doing.  Writing a blank line to the stats file will
clear the increment-only counters.  Allocated resource counters don't get
cleared.

Add some counters to count various things about DATA packets, including the
number created, transmitted and retransmitted and the number received, the
number of ACK-requests markings and the number of jumbo packets received.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

b0154246

rxrpc: Track highest acked serial · 589a0c1e

David Howells authored Apr 28, 2022

Keep track of the highest DATA serial number that has been acked by the
peer for future purposes.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

589a0c1e

rxrpc: Split call timer-expiration from call timer-set tracepoint · 334dfbfc

David Howells authored Apr 22, 2022

Split the tracepoint for call timer-set to separate out the call
timer-expiration event
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

334dfbfc

rxrpc: Trace setting of the request-ack flag · 4d843be5

David Howells authored Apr 05, 2022

Add a tracepoint to log why the request-ack flag is set on an outgoing DATA
packet, allowing debugging as to why.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org

4d843be5

net, proc: Provide PROC_FS=n fallback for proc_create_net_single_write() · c3d96f69

David Howells authored Oct 03, 2022

Provide a CONFIG_PROC_FS=n fallback for proc_create_net_single_write().

Also provide a fallback for proc_create_net_data_write().

Fixes: 564def71 ("proc: Add a way to make network proc files writable")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org

c3d96f69

28 Oct, 2022 17 commits

nfc: s3fwrn5: use devm_clk_get_optional_enabled() helper · f8f797f3

Dmitry Torokhov authored Oct 27, 2022

Because we enable the clock immediately after acquiring it in probe,
we can combine the 2 operations and use devm_clk_get_optional_enabled()
helper.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

f8f797f3

Merge branch 'txgbe' · 92c7076e

David S. Miller authored Oct 28, 2022

Jiawen Wu says:

====================
net: WangXun txgbe ethernet driver

This patch series adds support for WangXun 10 gigabit NIC, to initialize
hardware, set mac address, and register netdev.

Change log:
v6: address comments:
    Jakub Kicinski: check with scripts/kernel-doc
v5: address comments:
    Jakub Kicinski: clean build with W=1 C=1
v4: address comments:
    Andrew Lunn: https://lore.kernel.org/all/YzXROBtztWopeeaA@lunn.ch/
v3: address comments:
    Andrew Lunn: remove hw function ops, reorder functions, use BIT(n)
                 for register bit offset, move the same code of txgbe
                 and ngbe to libwx
v2: address comments:
    Andrew Lunn: https://lore.kernel.org/netdev/YvRhld5rD%2FxgITEg@lunn.ch/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

92c7076e

net: txgbe: Set MAC address and register netdev · d21d2c7f

Jiawen Wu authored Oct 27, 2022

Add MAC address related operations, and register netdev.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d21d2c7f

net: txgbe: Reset hardware · b0801256

Jiawen Wu authored Oct 27, 2022

Reset and initialize the hardware by configuring the MAC layer.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b0801256

net: txgbe: Store PCI info · a34b3e6e

Jiawen Wu authored Oct 27, 2022

Get PCI config space info, set LAN id and check flash status.
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a34b3e6e

Merge branch 'tcp-plb' · 957ed5e7

David S. Miller authored Oct 28, 2022

Mubashir Adnan Qureshi says:

====================
net: Add PLB functionality to TCP

This patch series adds PLB (Protective Load Balancing) to TCP and hooks
it up to DCTCP. PLB is disabled by default and can be enabled using
relevant sysctls and support from underlying CC.

PLB (Protective Load Balancing) is a host based mechanism for load
balancing across switch links. It leverages congestion signals(e.g. ECN)
from transport layer to randomly change the path of the connection
experiencing congestion. PLB changes the path of the connection by
changing the outgoing IPv6 flow label for IPv6 connections (implemented
in Linux by calling sk_rethink_txhash()). Because of this implementation
mechanism, PLB can currently only work for IPv6 traffic. For more
information, see the SIGCOMM 2022 paper:
  https://doi.org/10.1145/3544216.3544226
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

957ed5e7

tcp: add rcv_wnd and plb_rehash to TCP_INFO · 71fc7047

Mubashir Adnan Qureshi authored Oct 26, 2022

rcv_wnd can be useful to diagnose TCP performance where receiver window
becomes the bottleneck. rehash reports the PLB and timeout triggered
rehash attempts by the TCP connection.
Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

71fc7047

tcp: add u32 counter in tcp_sock and an SNMP counter for PLB · 29c1c446

Mubashir Adnan Qureshi authored Oct 26, 2022

A u32 counter is added to tcp_sock for counting the number of PLB
triggered rehashes for a TCP connection. An SNMP counter is also
added to count overall PLB triggered rehash events for a host. These
counters are hooked up to PLB implementation for DCTCP.

TCP_NLA_REHASH is added to SCM_TIMESTAMPING_OPT_STATS that reports
the rehash attempts triggered due to PLB or timeouts. This gives
a historical view of sustained congestion or timeouts experienced
by the TCP connection.
Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

29c1c446

tcp: add support for PLB in DCTCP · c30f8e0b

Mubashir Adnan Qureshi authored Oct 26, 2022

PLB support is added to TCP DCTCP code. As DCTCP uses ECN as the
congestion signal, PLB also uses ECN to make decisions whether to change
the path or not upon sustained congestion.
Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c30f8e0b

tcp: add PLB functionality for TCP · 1a91bb7c

Mubashir Adnan Qureshi authored Oct 26, 2022

Congestion control algorithms track PLB state and cause the connection
to trigger a path change when either of the 2 conditions is satisfied:

- No packets are in flight and (# consecutive congested rounds >=
  sysctl_tcp_plb_idle_rehash_rounds)
- (# consecutive congested rounds >= sysctl_tcp_plb_rehash_rounds)

A round (RTT) is marked as congested when congestion signal
(ECN ce_ratio) over an RTT is greater than sysctl_tcp_plb_cong_thresh.
In the event of RTO, PLB (via tcp_write_timeout()) triggers a path
change and disables congestion-triggered path changes for random time
between (sysctl_tcp_plb_suspend_rto_sec, 2*sysctl_tcp_plb_suspend_rto_sec)
to avoid hopping onto the "connectivity blackhole". RTO-triggered
path changes can still happen during this cool-off period.
Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1a91bb7c

tcp: add sysctls for TCP PLB parameters · bd456f28

Mubashir Adnan Qureshi authored Oct 26, 2022

PLB (Protective Load Balancing) is a host based mechanism for load
balancing across switch links. It leverages congestion signals(e.g. ECN)
from transport layer to randomly change the path of the connection
experiencing congestion. PLB changes the path of the connection by
changing the outgoing IPv6 flow label for IPv6 connections (implemented
in Linux by calling sk_rethink_txhash()). Because of this implementation
mechanism, PLB can currently only work for IPv6 traffic. For more
information, see the SIGCOMM 2022 paper:
  https://doi.org/10.1145/3544216.3544226

This commit adds new sysctl knobs and sets their default values for
TCP PLB.
Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bd456f28

Merge branch 'mxl-gpy-MDI-X' · 7f86cf50

David S. Miller authored Oct 28, 2022

Raju Lakkaraju says:

====================
net: phy: mxl-gpy: Add MDI-X

This patch series add the MDI-X feature to GPY211 PHYs and
Also Change return type to gpy_update_interface() function
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

7f86cf50

net: phy: mxl-gpy: Add PHY Auto/MDI/MDI-X set driver for GPY211 chips · fd8825cd

Raju Lakkaraju authored Oct 26, 2022

Add support for MDI-X status and configuration for GPY211 chips
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fd8825cd

net: phy: mxl-gpy: Change gpy_update_interface() function return type · 7a495dde

Raju Lakkaraju authored Oct 26, 2022

gpy_update_interface() is called from gpy_read_status() which does
return error codes. gpy_read_status() would benefit from returning
-EINVAL, etc.
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7a495dde

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next · 12dee519

Jakub Kicinski authored Oct 27, 2022

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

1) Move struct nft_payload_set definition to .c file where it is
   only used.

2) Shrink transport and inner header offset fields in the nft_pktinfo
   structure to 16-bits, from Florian Westphal.

3) Get rid of nft_objref Kbuild toggle, make it built-in into
   nf_tables. This expression is used to instantiate conntrack helpers
   in nftables. After removing the conntrack helper auto-assignment
   toggle it this feature became more important so move it to the nf_tables
   core module. Also from Florian.

4) Extend the existing function to calculate payload inner header offset
   to deal with the GRE and IPIP transport protocols.

6) Add inner expression support for nf_tables. This new expression
   provides a packet parser for tunneled packets which uses a userspace
   description of the expected inner headers. The inner expression
   invokes the payload expression (via direct call) to match on the
   inner header protocol fields using the inner link, network and
   transport header offsets.

   An example of the bytecode generated from userspace to match on
   IP source encapsulated in a VxLAN packet:

   # nft --debug=netlink add rule netdev x y udp dport 4789 vxlan ip saddr 1.2.3.4
     netdev x y
       [ meta load l4proto => reg 1 ]
       [ cmp eq reg 1 0x00000011 ]
       [ payload load 2b @ transport header + 2 => reg 1 ]
       [ cmp eq reg 1 0x0000b512 ]
       [ inner type vxlan hdrsize 8 flags f [ meta load protocol => reg 1 ] ]
       [ cmp eq reg 1 0x00000008 ]
       [ inner type vxlan hdrsize 8 flags f [ payload load 4b @ network header + 12 => reg 1 ] ]
       [ cmp eq reg 1 0x04030201 ]

7) Store inner link, network and transport header offsets in percpu
   area to parse inner packet header once only. Matching on a different
   tunnel type invalidates existing offsets in the percpu area and it
   invokes the inner tunnel parser again.

8) Add support for inner meta matching. This support for
   NFTA_META_PROTOCOL, which specifies the inner ethertype, and
   NFT_META_L4PROTO, which specifies the inner transport protocol.

9) Extend nft_inner to parse GENEVE optional fields to calculate the
   link layer offset.

10) Update inner expression so tunnel offset points to GRE header
    to normalize tunnel header handling. This also allows to perform
    different interpretations of the GRE header from userspace.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nft_inner: set tunnel offset to GRE header offset
  netfilter: nft_inner: add geneve support
  netfilter: nft_meta: add inner match support
  netfilter: nft_inner: add percpu inner context
  netfilter: nft_inner: support for inner tunnel header matching
  netfilter: nft_payload: access ipip payload for inner offset
  netfilter: nft_payload: access GRE payload via inner offset
  netfilter: nft_objref: make it builtin
  netfilter: nf_tables: reduce nft_pktinfo by 8 bytes
  netfilter: nft_payload: move struct nft_payload_set definition where it belongs
====================

Link: https://lore.kernel.org/r/20221026132227.3287-1-pablo@netfilter.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

12dee519

net: dpaa2-eth: Simplify bool conversion · 148b811c

Yang Li authored Oct 26, 2022

./drivers/net/ethernet/freescale/dpaa2/dpaa2-xsk.c:453:42-47: WARNING: conversion to bool not needed here

Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2577Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Link: https://lore.kernel.org/r/20221026051824.38730-1-yang.lee@linux.alibaba.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

148b811c

Merge branch 'ionic-vf-attr-replay-and-other-updates' · 3e8e4aa5

Jakub Kicinski authored Oct 27, 2022

Shannon Nelson says:

====================
ionic: VF attr replay and other updates

For better VF management when a FW update restart or a FW crash recover is
detected, the PF now will replay any user specified VF attributes to be
sure the FW hasn't lost them in the restart.

Newer FW offers more packet processing offloads, so we now support them in
the driver.

A small refactor of the Rx buffer fill cleans a bit of code and will help
future work on buffer caching.
====================

Link: https://lore.kernel.org/r/20221026143744.11598-1-snelson@pensando.ioSigned-off-by: Jakub Kicinski <kuba@kernel.org>

3e8e4aa5