Commits · 9e300987d4a81fb95c323f042dd5aa484f4eb3dd · Kirill Smelkov / linux

28 Oct, 2021 40 commits

ice: VXLAN and Geneve TC support · 9e300987

Michal Swiatkowski authored Oct 12, 2021

Add definition for VXLAN and Geneve dummy packet. Define VXLAN and
Geneve type of fields to match on correct UDP tunnel header.

Parse tunnel specific fields from TC tool like outer MACs, outer IPs,
outer destination port and VNI. Save values and masks in outer header
struct and move header pointer to inner to simplify parsing inner
values.

There are two cases for redirect action:
- from uplink to VF - TC filter is added on tunnel device
- from VF to uplink - TC filter is added on PR, for this case check if
  redirect device is tunnel device

VXLAN example:
- create tunnel device
ip l add $VXLAN_DEV type vxlan id $VXLAN_VNI dstport $VXLAN_PORT \
dev $PF
- add TC filter (in switchdev mode)
tc filter add dev $VXLAN_DEV protocol ip parent ffff: flower \
enc_dst_ip $VF1_IP enc_key_id $VXLAN_VNI action mirred egress \
redirect dev $VF1_PR

Geneve example:
- create tunnel device
ip l add $GENEVE_DEV type geneve id $GENEVE_VNI dstport $GENEVE_PORT \
remote $GENEVE_IP
- add TC filter (in switchdev mode)
tc filter add dev $GENEVE_DEV protocol ip parent ffff: flower \
enc_key_id $GENEVE_VNI dst_ip $GENEVE1_IP action mirred egress \
redirect dev $VF1_PR
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

9e300987

ice: support for indirect notification · 195bb48f

Michal Swiatkowski authored Oct 12, 2021

Implement indirect notification mechanism to support offloading TC rules
on tunnel devices.

Keep indirect block list in netdev priv. Notification will call setting
tc cls flower function. For now we can offload only ingress type. Return
not supported for other flow block binder.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Acked-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

195bb48f

net: virtio: use eth_hw_addr_set() · f2edaa4a

Jakub Kicinski authored Oct 27, 2021

Commit 406f42fa ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it go through appropriate helpers.

Even though the current code uses dev->addr_len the we can switch
to eth_hw_addr_set() instead of dev_addr_set(). The netdev is
always allocated by alloc_etherdev_mq() and there are at least two
places which assume Ethernet address:
 - the line below calling eth_hw_addr_random()
 - virtnet_set_mac_address() -> eth_commit_mac_addr_change()
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20211027152012.3393077-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

f2edaa4a

devlink: Simplify internal devlink params implementation · ee775b56

Leon Romanovsky authored Oct 28, 2021

Reduce extra indirection from devlink_params_*() API. Such change
makes it clear that we can drop devlink->lock from these flows, because
everything is executed when the devlink is not registered yet.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ee775b56

Merge branch 'octeontx2-debugfs-updates' · b0e77fcc

David S. Miller authored Oct 28, 2021

Rakesh Babu Saladi says:

====================
RVU Debugfs updates.

Patch 1: Few minor changes such as spelling mistakes, deleting unwanted
characters, etc.
Patch 2: Add debugfs dump for lmtst map table
Patch 3: Add channel and channel mask in debugfs.

Changes made from v2 to v3:
1. In patch 1 moved few lines and submitted those changes as a
different patch to net branch
2. Patch 2 is left unchanged.
3. Patch 3 is left unchanged.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b0e77fcc

octeontx2-af: debugfs: Add channel and channel mask. · 9716a40a

Rakesh Babu authored Oct 27, 2021

This patch is to dispaly channel and channel_mask for each RX
interface of NPC MCAM rule.
Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9716a40a

octeontx2-af: cn10k: debugfs for dumping LMTST map table · 0daa55d0

Harman Kalra authored Oct 27, 2021

CN10k SoCs use atomic stores of up to 128 bytes to submit
packets/instructions into co-processor cores. The enqueueing is performed
using Large Memory Transaction Store (LMTST) operations. They allow for
lockless enqueue operations - i.e., two different CPU cores can submit
instructions to the same queue without needing to lock the queue or
synchronize their accesses.

This patch implements a new debugfs entry for dumping LMTST map
table present on CN10K, as this might be very useful to debug any issue
in case of shared LMTST region among multiple pci functions.
Signed-off-by: Harman Kalra <hkalra@marvell.com>
Signed-off-by: Bhaskara Budiredla <bbudiredla@marvell.com>
Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0daa55d0

octeontx2-af: debugfs: Minor changes. · 1910ccf0

Rakesh Babu Saladi authored Oct 27, 2021

Few changes in rvu_debugfs.c file to remove unwanted characters,
indenting the code, added a new comment line etc.
Signed-off-by: Rakesh Babu Saladi <rsaladi2@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1910ccf0

net: phy: microchip_t1: add cable test support for lan87xx phy · 78805025

Yuiko Oshino authored Oct 27, 2021

Add a basic cable test (diagnostic) support for lan87xx phy.
Tested with LAN8770 for connected/open/short wires using ethtool.
Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

78805025

ptp: fix code indentation issues · 11195bf5

Carlos Llamas authored Oct 27, 2021

This fixes the following checkpatch.pl errors:

ERROR: code indent should use tabs where possible
+^I        if (ptp->pps_source)$

ERROR: code indent should use tabs where possible
+^I                pps_unregister_source(ptp->pps_source);$

ERROR: code indent should use tabs where possible
+^I                kthread_destroy_worker(ptp->kworker);$

Fixes: 4225fea1 ("ptp: Fix possible memory leak in ptp_clock_register()")
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

11195bf5

net: cleanup __sk_stream_memory_free() · a406290a

Eric Dumazet authored Oct 27, 2021

We now have INDIRECT_CALL_INET_1() macro, no need to use #ifdef CONFIG_INET
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a406290a

sky2: Remove redundant assignment and parentheses · 6a03bfbd

luo penghao authored Oct 28, 2021

The variable err will be reassigned on subsequent branches, and this
assignment does not perform related value operations. This will cause
the double parentheses to be redundant, so the inner parentheses should
be deleted.

clang_analyzer complains as follows:

drivers/net/ethernet/marvell/sky2.c:4988: warning:

Although the value stored to 'err' is used in the enclosing expression,
the value is never actually read from 'err'.

Changes in v2:

modify title category:octeontx2-af to sky2.
delete the inner parentheses.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: luo penghao <luo.penghao@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>

6a03bfbd

net: ipconfig: Release the rtnl_lock while waiting for carrier · ee046d9a

Maxime Chevallier authored Oct 28, 2021

While waiting for a carrier to come on one of the netdevices, some
devices will require to take the rtnl lock at some point to fully
initialize all parts of the link.

That's the case for SFP, where the rtnl is taken when a module gets
detected. This prevents mounting an NFS rootfs over an SFP link.

This means that while ipconfig waits for carriers to be detected, no SFP
modules can be detected in the meantime, it's only detected after
ipconfig times out.

This commit releases the rtnl_lock while waiting for the carrier to come
up, and re-takes it to check the for the init device and carrier status.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ee046d9a

devlink: add documentation for octeontx2 driver · 442e796f

Subbaraya Sundeep authored Oct 28, 2021

Add a file to document devlink support for octeontx2
driver. Driver-specific parameters implemented by
AF, PF and VF drivers are documented.
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

442e796f

sch_htb: Add extack messages for EOPNOTSUPP errors · 648a991c

Maxim Mikityanskiy authored Oct 28, 2021

In order to make the "Operation not supported" message clearer to the
user, add extack messages explaining why exactly adding offloaded HTB
could be not supported in each case.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

648a991c

Merge tag 'mlx5-net-next-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 3a26babb

David S. Miller authored Oct 28, 2021

Saeed Mahameed says:

====================
Merge mlx5-next into net-next
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

3a26babb

Merge branch 'mvpp2-phylink' · 1feef2de

David S. Miller authored Oct 28, 2021

Russell King says:

====================
Convert mvpp2 to phylink supported_interfaces

This patch series converts mvpp2 to use phylinks supported_interfaces
bitmap to simplify the validate() implementation. The patches:

1) Add the supported interface modes the supported_interfaces bitmap.
2) Removes the checks for the interface type being supported from
   the validate callback
3) Removes the now unnecessary checks and call to
   phylink_helper_basex_speed() to support switching between
   1000base-X and 2500base-X for SFPs
4) Cleans up the resulting validate() code.

(3) becomes possible because when asking the MAC for its complete
support, we walk all supported interfaces which will include 1000base-X
and 2500base-X only if the comphy is present.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

1feef2de

net: mvpp2: clean up mvpp2_phylink_validate() · b63f1117

Russell King (Oracle) authored Oct 27, 2021

mvpp2_phylink_validate() no longer needs to check for
PHY_INTERFACE_MODE_NA as phylink will walk the supported interface
types to discover the link mode capabilities. Remove these checks.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

b63f1117

net: mvpp2: drop use of phylink_helper_basex_speed() · 76947a63

Russell King (Oracle) authored Oct 27, 2021

Now that we have a better method to select SFP interface modes, we
no longer need to use phylink_helper_basex_speed() in a driver's
validation function, and we can also get rid of our hack to indicate
both 1000base-X and 2500base-X if the comphy is present to make that
work. Remove this hack and use of phylink_helper_basex_speed().
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

76947a63

net: mvpp2: remove interface checks in mvpp2_phylink_validate() · 6c0c4b7a

Russell King (Oracle) authored Oct 27, 2021

As phylink checks the interface mode against the supported_interfaces
bitmap, we no longer need to validate the interface mode in the
validation function. Remove this to simplify it.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

6c0c4b7a

net: mvpp2: populate supported_interfaces member · 8498e17e

Russell King authored Oct 27, 2021

Populate the phy interface mode bitmap for the Marvell mvpp2 driver
with interfaces modes supported by the MAC.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

8498e17e

ipv6: enable net.ipv6.route.max_size sysctl in network namespace · 06e6c88f

Alexander Kuznetsov authored Oct 27, 2021

We want to increase route cache size in network namespace
created with user namespace. Currently ipv6 route settings
are disabled for non-initial network namespaces.
We can allow this sysctl and it will be safe since
commit <6126891c> because route cache account to kmem,
that is why users from user namespace can not DOS system.
Signed-off-by: Alexander Kuznetsov <wwfq@yandex-team.ru>
Acked-by: Dmitry Yakunin <zeil@yandex-team.ru>
Acked-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

06e6c88f

mpt fusion: use dev_addr_set() · e0b4f1cd

Jakub Kicinski authored Oct 26, 2021

Commit 406f42fa ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it go through appropriate helpers.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e0b4f1cd

firewire: don't write directly to netdev->dev_addr · aaaaa137

Jakub Kicinski authored Oct 26, 2021

Commit 406f42fa ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it go through appropriate helpers.

Prepare fwnet_hwaddr on the stack and use dev_addr_set() to copy
it to netdev->dev_addr. We no longer need to worry about alignment.
union fwnet_hwaddr does not have any padding and we set all fields
so we don't need to zero it upfront.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

aaaaa137

media: use eth_hw_addr_set() · 707182e4

Jakub Kicinski authored Oct 26, 2021

Commit 406f42fa ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it go through appropriate helpers.

Convert media from memcpy(... 6) and memcpy(... addr_len) to
eth_hw_addr_set():

  @@
  expression dev, np;
  @@
  - memcpy(dev->dev_addr, np, 6)
  + eth_hw_addr_set(dev, np)
  @@
  - memcpy(dev->dev_addr, np, dev->addr_len)
  + eth_hw_addr_set(dev, np)

Make sure we don't cast off const qualifier from dev->dev_addr.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

707182e4

Merge branch 'tcp-tx-side-cleanups' · 701b9519

David S. Miller authored Oct 28, 2021

Eric Dumazet says:

====================
tcp: tx side cleanups

We no longer need to set skb->reserved_tailroom because
TCP sendmsg() do not put payload in skb->head anymore.

Also do some cleanups around skb->ip_summed/csum,
and CP_SKB_CB(skb)->sacked for fresh skbs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

701b9519

tcp: do not clear TCP_SKB_CB(skb)->sacked if already zero · 8b7d8c2b

Eric Dumazet authored Oct 27, 2021

Freshly allocated skbs have zero in skb->cb[] already.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8b7d8c2b

tcp: do not clear skb->csum if already zero · 4f226674

Eric Dumazet authored Oct 27, 2021

Freshly allocated skbs have their csum field cleared already.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4f226674

tcp: factorize ip_summed setting · a52fe46e

Eric Dumazet authored Oct 27, 2021

Setting skb->ip_summed to CHECKSUM_PARTIAL can be centralized
in tcp_stream_alloc_skb() and __mptcp_do_alloc_tx_skb()
instead of being done multiple times.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a52fe46e

tcp: no longer set skb->reserved_tailroom · f401da47

Eric Dumazet authored Oct 27, 2021

TCP/MPTCP sendmsg() no longer puts payload in skb->head,
we can remove not needed code.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f401da47

tcp: remove dead code from tcp_collapse_retrans() · bd446314

Eric Dumazet authored Oct 27, 2021

TCP sendmsg() no longer puts payload in skb->head,
remove some dead code from tcp_collapse_retrans().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bd446314

tcp: cleanup tcp_remove_empty_skb() use · 27728ba8

Eric Dumazet authored Oct 27, 2021

All tcp_remove_empty_skb() callers now use tcp_write_queue_tail()
for the skb argument, we can therefore factorize code.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

27728ba8

tcp: remove dead code from tcp_sendmsg_locked() · 3ded97bc

Eric Dumazet authored Oct 27, 2021

TCP sendmsg() no longer puts payload in skb head, we can remove
dead code.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3ded97bc

Merge branch 'mlx5-next' of... · 573bce9e

Saeed Mahameed authored Oct 27, 2021

Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into net-next
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

573bce9e

net: phy: Fix unsigned comparison with less than zero · 911e3a46

Jiapeng Chong authored Oct 27, 2021

Fix the following coccicheck warning:

./drivers/net/phy/at803x.c:493:5-10: WARNING: Unsigned expression
compared with zero: value < 0.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Fixes: 7beecaf7 ("net: phy: at803x: improve the WOL feature")
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/1635325191-101815-1-git-send-email-jiapeng.chong@linux.alibaba.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

911e3a46

Merge branch 'mptcp-rework-fwd-memory-allocation-and-one-cleanup' · 21214d55

Jakub Kicinski authored Oct 27, 2021

Mat Martineau says:

====================
mptcp: Rework fwd memory allocation and one cleanup

These patches from the MPTCP tree rework forward memory allocation for
MPTCP (with some supporting changes in the net core), and also clean up
an unused function parameter.

Patch 1 updates TCP code but does not change any behavior, and creates
some macros for reclaim thresholds that will be reused in the MPTCP
code.

Patch 2 adds sk_forward_alloc_get() to the networking core to support
MPTCP's forward allocation with the diag interface.

Patch 3 reworks forward memory for MPTCP.

Patch 4 removes an unused arg and has no functional changes.
====================

Link: https://lore.kernel.org/r/20211026232916.179450-1-mathew.j.martineau@linux.intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

21214d55

mptcp: drop unused sk in mptcp_push_release · b8e0def3

Geliang Tang authored Oct 26, 2021

Since mptcp_set_timeout() had removed from mptcp_push_release() in
commit 33d41c9c ("mptcp: more accurate timeout"), the argument
sk in mptcp_push_release() became useless. Let's drop it.

Fixes: 33d41c9c ("mptcp: more accurate timeout")
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

b8e0def3

mptcp: allocate fwd memory separately on the rx and tx path · 6511882c

Paolo Abeni authored Oct 26, 2021

All the mptcp receive path is protected by the msk socket
spinlock. As consequences, the tx path has to play a few tricks to
allocate the forward memory without acquiring the spinlock multiple
times, making the overall TX path quite complex.

This patch tries to clean-up a bit the tx path, using completely
separated fwd memory allocation, for the rx and the tx path.

The forward memory allocated in the rx path is now accounted in
msk->rmem_fwd_alloc and is (still) protected by the msk socket spinlock.

To cope with the above we provide a few MPTCP-specific variants for
the helpers to charge, uncharge, reclaim and free the forward memory
in the receive path.

msk->sk_forward_alloc now accounts only the forward memory for the tx
path, we can use the plain core sock helper to manipulate it and drop
quite a bit of complexity.

On memory pressure, both rx and tx fwd memories are reclaimed.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

6511882c

net: introduce sk_forward_alloc_get() · 292e6077

Paolo Abeni authored Oct 26, 2021

A later patch will change the MPTCP memory accounting schema
in such a way that MPTCP sockets will encode the total amount of
forward allocated memory in two separate fields (one for tx and
one for rx).

MPTCP sockets will use their own helper to provide the accurate
amount of fwd allocated memory.

To allow the above, this patch adds a new, optional, sk method to
fetch the fwd memory, wrap the call in a new helper and use it
where it is appropriate.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

292e6077

tcp: define macros for a couple reclaim thresholds · 5823fc96

Paolo Abeni authored Oct 26, 2021

A following patch is going to implement a similar reclaim schema
for the MPTCP protocol, with different locking.

Let's define a couple of macros for the used thresholds, so
that the latter code will be more easily maintainable.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

5823fc96