Commits · 3a26babb418362de060811fc7b077088ba650f7f · Kirill Smelkov / linux

28 Oct, 2021 32 commits

Merge tag 'mlx5-net-next-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 3a26babb

David S. Miller authored Oct 28, 2021

Saeed Mahameed says:

====================
Merge mlx5-next into net-next
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

3a26babb

Merge branch 'mvpp2-phylink' · 1feef2de

David S. Miller authored Oct 28, 2021

Russell King says:

====================
Convert mvpp2 to phylink supported_interfaces

This patch series converts mvpp2 to use phylinks supported_interfaces
bitmap to simplify the validate() implementation. The patches:

1) Add the supported interface modes the supported_interfaces bitmap.
2) Removes the checks for the interface type being supported from
   the validate callback
3) Removes the now unnecessary checks and call to
   phylink_helper_basex_speed() to support switching between
   1000base-X and 2500base-X for SFPs
4) Cleans up the resulting validate() code.

(3) becomes possible because when asking the MAC for its complete
support, we walk all supported interfaces which will include 1000base-X
and 2500base-X only if the comphy is present.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

1feef2de

net: mvpp2: clean up mvpp2_phylink_validate() · b63f1117

Russell King (Oracle) authored Oct 27, 2021

mvpp2_phylink_validate() no longer needs to check for
PHY_INTERFACE_MODE_NA as phylink will walk the supported interface
types to discover the link mode capabilities. Remove these checks.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

b63f1117

net: mvpp2: drop use of phylink_helper_basex_speed() · 76947a63

Russell King (Oracle) authored Oct 27, 2021

Now that we have a better method to select SFP interface modes, we
no longer need to use phylink_helper_basex_speed() in a driver's
validation function, and we can also get rid of our hack to indicate
both 1000base-X and 2500base-X if the comphy is present to make that
work. Remove this hack and use of phylink_helper_basex_speed().
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

76947a63

net: mvpp2: remove interface checks in mvpp2_phylink_validate() · 6c0c4b7a

Russell King (Oracle) authored Oct 27, 2021

As phylink checks the interface mode against the supported_interfaces
bitmap, we no longer need to validate the interface mode in the
validation function. Remove this to simplify it.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

6c0c4b7a

net: mvpp2: populate supported_interfaces member · 8498e17e

Russell King authored Oct 27, 2021

Populate the phy interface mode bitmap for the Marvell mvpp2 driver
with interfaces modes supported by the MAC.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

8498e17e

ipv6: enable net.ipv6.route.max_size sysctl in network namespace · 06e6c88f

Alexander Kuznetsov authored Oct 27, 2021

We want to increase route cache size in network namespace
created with user namespace. Currently ipv6 route settings
are disabled for non-initial network namespaces.
We can allow this sysctl and it will be safe since
commit <6126891c> because route cache account to kmem,
that is why users from user namespace can not DOS system.
Signed-off-by: Alexander Kuznetsov <wwfq@yandex-team.ru>
Acked-by: Dmitry Yakunin <zeil@yandex-team.ru>
Acked-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

06e6c88f

mpt fusion: use dev_addr_set() · e0b4f1cd

Jakub Kicinski authored Oct 26, 2021

Commit 406f42fa ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it go through appropriate helpers.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e0b4f1cd

firewire: don't write directly to netdev->dev_addr · aaaaa137

Jakub Kicinski authored Oct 26, 2021

Commit 406f42fa ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it go through appropriate helpers.

Prepare fwnet_hwaddr on the stack and use dev_addr_set() to copy
it to netdev->dev_addr. We no longer need to worry about alignment.
union fwnet_hwaddr does not have any padding and we set all fields
so we don't need to zero it upfront.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

aaaaa137

media: use eth_hw_addr_set() · 707182e4

Jakub Kicinski authored Oct 26, 2021

Commit 406f42fa ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it go through appropriate helpers.

Convert media from memcpy(... 6) and memcpy(... addr_len) to
eth_hw_addr_set():

  @@
  expression dev, np;
  @@
  - memcpy(dev->dev_addr, np, 6)
  + eth_hw_addr_set(dev, np)
  @@
  - memcpy(dev->dev_addr, np, dev->addr_len)
  + eth_hw_addr_set(dev, np)

Make sure we don't cast off const qualifier from dev->dev_addr.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

707182e4

Merge branch 'tcp-tx-side-cleanups' · 701b9519

David S. Miller authored Oct 28, 2021

Eric Dumazet says:

====================
tcp: tx side cleanups

We no longer need to set skb->reserved_tailroom because
TCP sendmsg() do not put payload in skb->head anymore.

Also do some cleanups around skb->ip_summed/csum,
and CP_SKB_CB(skb)->sacked for fresh skbs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

701b9519

tcp: do not clear TCP_SKB_CB(skb)->sacked if already zero · 8b7d8c2b

Eric Dumazet authored Oct 27, 2021

Freshly allocated skbs have zero in skb->cb[] already.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8b7d8c2b

tcp: do not clear skb->csum if already zero · 4f226674

Eric Dumazet authored Oct 27, 2021

Freshly allocated skbs have their csum field cleared already.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4f226674

tcp: factorize ip_summed setting · a52fe46e

Eric Dumazet authored Oct 27, 2021

Setting skb->ip_summed to CHECKSUM_PARTIAL can be centralized
in tcp_stream_alloc_skb() and __mptcp_do_alloc_tx_skb()
instead of being done multiple times.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a52fe46e

tcp: no longer set skb->reserved_tailroom · f401da47

Eric Dumazet authored Oct 27, 2021

TCP/MPTCP sendmsg() no longer puts payload in skb->head,
we can remove not needed code.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f401da47

tcp: remove dead code from tcp_collapse_retrans() · bd446314

Eric Dumazet authored Oct 27, 2021

TCP sendmsg() no longer puts payload in skb->head,
remove some dead code from tcp_collapse_retrans().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bd446314

tcp: cleanup tcp_remove_empty_skb() use · 27728ba8

Eric Dumazet authored Oct 27, 2021

All tcp_remove_empty_skb() callers now use tcp_write_queue_tail()
for the skb argument, we can therefore factorize code.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

27728ba8

tcp: remove dead code from tcp_sendmsg_locked() · 3ded97bc

Eric Dumazet authored Oct 27, 2021

TCP sendmsg() no longer puts payload in skb head, we can remove
dead code.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3ded97bc

Merge branch 'mlx5-next' of... · 573bce9e

Saeed Mahameed authored Oct 27, 2021

Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into net-next
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

573bce9e

net: phy: Fix unsigned comparison with less than zero · 911e3a46

Jiapeng Chong authored Oct 27, 2021

Fix the following coccicheck warning:

./drivers/net/phy/at803x.c:493:5-10: WARNING: Unsigned expression
compared with zero: value < 0.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Fixes: 7beecaf7 ("net: phy: at803x: improve the WOL feature")
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/1635325191-101815-1-git-send-email-jiapeng.chong@linux.alibaba.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

911e3a46

Merge branch 'mptcp-rework-fwd-memory-allocation-and-one-cleanup' · 21214d55

Jakub Kicinski authored Oct 27, 2021

Mat Martineau says:

====================
mptcp: Rework fwd memory allocation and one cleanup

These patches from the MPTCP tree rework forward memory allocation for
MPTCP (with some supporting changes in the net core), and also clean up
an unused function parameter.

Patch 1 updates TCP code but does not change any behavior, and creates
some macros for reclaim thresholds that will be reused in the MPTCP
code.

Patch 2 adds sk_forward_alloc_get() to the networking core to support
MPTCP's forward allocation with the diag interface.

Patch 3 reworks forward memory for MPTCP.

Patch 4 removes an unused arg and has no functional changes.
====================

Link: https://lore.kernel.org/r/20211026232916.179450-1-mathew.j.martineau@linux.intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

21214d55

mptcp: drop unused sk in mptcp_push_release · b8e0def3

Geliang Tang authored Oct 26, 2021

Since mptcp_set_timeout() had removed from mptcp_push_release() in
commit 33d41c9c ("mptcp: more accurate timeout"), the argument
sk in mptcp_push_release() became useless. Let's drop it.

Fixes: 33d41c9c ("mptcp: more accurate timeout")
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

b8e0def3

mptcp: allocate fwd memory separately on the rx and tx path · 6511882c

Paolo Abeni authored Oct 26, 2021

All the mptcp receive path is protected by the msk socket
spinlock. As consequences, the tx path has to play a few tricks to
allocate the forward memory without acquiring the spinlock multiple
times, making the overall TX path quite complex.

This patch tries to clean-up a bit the tx path, using completely
separated fwd memory allocation, for the rx and the tx path.

The forward memory allocated in the rx path is now accounted in
msk->rmem_fwd_alloc and is (still) protected by the msk socket spinlock.

To cope with the above we provide a few MPTCP-specific variants for
the helpers to charge, uncharge, reclaim and free the forward memory
in the receive path.

msk->sk_forward_alloc now accounts only the forward memory for the tx
path, we can use the plain core sock helper to manipulate it and drop
quite a bit of complexity.

On memory pressure, both rx and tx fwd memories are reclaimed.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

6511882c

net: introduce sk_forward_alloc_get() · 292e6077

Paolo Abeni authored Oct 26, 2021

A later patch will change the MPTCP memory accounting schema
in such a way that MPTCP sockets will encode the total amount of
forward allocated memory in two separate fields (one for tx and
one for rx).

MPTCP sockets will use their own helper to provide the accurate
amount of fwd allocated memory.

To allow the above, this patch adds a new, optional, sk method to
fetch the fwd memory, wrap the call in a new helper and use it
where it is appropriate.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

292e6077

tcp: define macros for a couple reclaim thresholds · 5823fc96

Paolo Abeni authored Oct 26, 2021

A following patch is going to implement a similar reclaim schema
for the MPTCP protocol, with different locking.

Let's define a couple of macros for the used thresholds, so
that the latter code will be more easily maintainable.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

5823fc96

inet: remove races in inet{6}_getname() · 9dfc685e

Eric Dumazet authored Oct 26, 2021

syzbot reported data-races in inet_getname() multiple times,
it is time we fix this instead of pretending applications
should not trigger them.

getsockname() and getpeername() are not really considered fast path.

v2: added the missing BPF_CGROUP_RUN_SA_PROG() declaration
    needed when CONFIG_CGROUP_BPF=n, as reported by
    kernel test robot <lkp@intel.com>

syzbot typical report:

BUG: KCSAN: data-race in __inet_hash_connect / inet_getname

write to 0xffff888136d66cf8 of 2 bytes by task 14374 on cpu 1:
 __inet_hash_connect+0x7ec/0x950 net/ipv4/inet_hashtables.c:831
 inet_hash_connect+0x85/0x90 net/ipv4/inet_hashtables.c:853
 tcp_v4_connect+0x782/0xbb0 net/ipv4/tcp_ipv4.c:275
 __inet_stream_connect+0x156/0x6e0 net/ipv4/af_inet.c:664
 inet_stream_connect+0x44/0x70 net/ipv4/af_inet.c:728
 __sys_connect_file net/socket.c:1896 [inline]
 __sys_connect+0x254/0x290 net/socket.c:1913
 __do_sys_connect net/socket.c:1923 [inline]
 __se_sys_connect net/socket.c:1920 [inline]
 __x64_sys_connect+0x3d/0x50 net/socket.c:1920
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xa0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

read to 0xffff888136d66cf8 of 2 bytes by task 14408 on cpu 0:
 inet_getname+0x11f/0x170 net/ipv4/af_inet.c:790
 __sys_getsockname+0x11d/0x1b0 net/socket.c:1946
 __do_sys_getsockname net/socket.c:1961 [inline]
 __se_sys_getsockname net/socket.c:1958 [inline]
 __x64_sys_getsockname+0x3e/0x50 net/socket.c:1958
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x44/0xa0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

value changed: 0x0000 -> 0xdee0

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 14408 Comm: syz-executor.3 Not tainted 5.15.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20211026213014.3026708-1-eric.dumazet@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

9dfc685e

xdp: Remove redundant warning · b859a360

Yajun Deng authored Oct 27, 2021

There is a warning in xdp_rxq_info_unreg_mem_model() when reg_state isn't
equal to REG_STATE_REGISTERED, so the warning in xdp_rxq_info_unreg() is
redundant.
Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/r/20211027013856.1866-1-yajun.deng@linux.devSigned-off-by: Jakub Kicinski <kuba@kernel.org>

b859a360

net: thunderbolt: use eth_hw_addr_set() · 5a48585d

Jakub Kicinski authored Oct 26, 2021

Commit 406f42fa ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it go through appropriate helpers.
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Link: https://lore.kernel.org/r/20211026175547.3198242-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

5a48585d

staging: use of_get_ethdev_address() · 8b6ce9b0

Jakub Kicinski authored Oct 26, 2021

Use the new of_get_ethdev_address() helper for the cases
where dev->dev_addr is passed in directly as the destination.

  @@
  expression dev, np;
  @@
  - of_get_mac_address(np, dev->dev_addr)
  + of_get_ethdev_address(np, dev)
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20211026175038.3197397-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

8b6ce9b0

net: macb: Fix mdio child node detection · 8db3cbc5

Guenter Roeck authored Oct 26, 2021

Commit 4d98bb0d ("net: macb: Use mdio child node for MDIO bus if it
exists") added code to detect if a 'mdio' child node exists to the macb
driver. Ths added code does, however, not actually check if the child node
exists, but if the parent node exists. This results in errors such as

macb 10090000.ethernet eth0: Could not attach PHY (-19)

if there is no 'mdio' child node. Fix the code to actually check for
the child node.

Fixes: 4d98bb0d ("net: macb: Use mdio child node for MDIO bus if it exists")
Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Sean Anderson <sean.anderson@seco.com>
Tested-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Link: https://lore.kernel.org/r/20211026173950.353636-1-linux@roeck-us.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

8db3cbc5

net: sch: simplify condtion for selecting mini_Qdisc_pair buffer · 85c0c3eb

Seth Forshee authored Oct 26, 2021

The only valid values for a miniq pointer are NULL or a pointer to
miniq1 or miniq2, so testing for miniq_old != &miniq1 is functionally
equivalent to testing that it is NULL or equal to &miniq2.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Seth Forshee <sforshee@digitalocean.com>
Link: https://lore.kernel.org/r/20211026183721.137930-1-seth@forshee.meSigned-off-by: Jakub Kicinski <kuba@kernel.org>

85c0c3eb

net: sch: eliminate unnecessary RCU waits in mini_qdisc_pair_swap() · 26746382

Seth Forshee authored Oct 26, 2021

Currently rcu_barrier() is used to ensure that no readers of the
inactive mini_Qdisc buffer remain before it is reused. This waits for
any pending RCU callbacks to complete, when all that is actually
required is to wait for one RCU grace period to elapse after the buffer
was made inactive. This means that using rcu_barrier() may result in
unnecessary waits.

To improve this, store the current RCU state when a buffer is made
inactive and use poll_state_synchronize_rcu() to check whether a full
grace period has elapsed before reusing it. If a full grace period has
not elapsed, wait for a grace period to elapse, and in the non-RT case
use synchronize_rcu_expedited() to hasten it.

Since this approach eliminates the RCU callback it is no longer
necessary to synchronize_rcu() in the tp_head==NULL case. However, the
RCU state should still be saved for the previously active buffer.

Before this change I would typically see mini_qdisc_pair_swap() take
tens of milliseconds to complete. After this change it typcially
finishes in less than 1 ms, and often it takes just a few microseconds.

Thanks to Paul for walking me through the options for improving this.

Cc: "Paul E. McKenney" <paulmck@kernel.org>
Signed-off-by: Seth Forshee <sforshee@digitalocean.com>
Link: https://lore.kernel.org/r/20211026130700.121189-1-seth@forshee.meSigned-off-by: Jakub Kicinski <kuba@kernel.org>

26746382

27 Oct, 2021 8 commits

net: sched: gred: dynamically allocate tc_gred_qopt_offload · f25c0515

Arnd Bergmann authored Oct 26, 2021

The tc_gred_qopt_offload structure has grown too big to be on the
stack for 32-bit architectures after recent changes.

net/sched/sch_gred.c:903:13: error: stack frame size (1180) exceeds limit (1024) in 'gred_destroy' [-Werror,-Wframe-larger-than]
net/sched/sch_gred.c:310:13: error: stack frame size (1212) exceeds limit (1024) in 'gred_offload' [-Werror,-Wframe-larger-than]

Use dynamic allocation per qdisc to avoid this.

Fixes: 50dc9a85 ("net: sched: Merge Qdisc::bstats and Qdisc::cpu_bstats data types")
Fixes: 67c9e627 ("net: sched: Protect Qdisc::bstats with u64_stats")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20211026100711.nalhttf6mbe6sudx@linutronix.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

f25c0515

Merge branch 'two-reverts-to-calm-down-devlink-discussion' · 4796e251

Jakub Kicinski authored Oct 27, 2021

Leon Romanovsky says:

====================
Two reverts to calm down devlink discussion

Two reverts as was discussed in [1], fast, easy and wrong in long run
solution to syzkaller bug [2].

[1] https://lore.kernel.org/all/20211026120234.3408fbcc@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com
[2] https://lore.kernel.org/netdev/000000000000af277405cf0a7ef0@google.com/
====================

Link: https://lore.kernel.org/r/cover.1635276828.git.leonro@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

4796e251

Revert "devlink: Remove not-executed trap policer notifications" · c5e0321e

Leon Romanovsky authored Oct 26, 2021

This reverts commit 22849b5e as it
revealed that mlxsw and netdevsim (copy/paste from mlxsw) reregisters
devlink objects during another devlink user triggered command.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

c5e0321e

Revert "devlink: Remove not-executed trap group notifications" · fb9d19c2

Leon Romanovsky authored Oct 26, 2021

This reverts commit 8bbeed48 as it
revealed that mlxsw and netdevsim (copy/paste from mlxsw) reregisters
devlink objects during another devlink user triggered command.

Fixes: 22849b5e ("devlink: Remove not-executed trap policer notifications")
Reported-by: syzbot+93d5accfaefceedf43c1@syzkaller.appspotmail.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

fb9d19c2

Merge branch 'br-fdb-refactoring' · 6487c819

David S. Miller authored Oct 27, 2021

Vladimir Oltean says:

====================
Bridge FDB refactoring

This series refactors the br_fdb.c, br_switchdev.c and switchdev.c files
to offer the same level of functionality with a bit less code, and to
clarify the purpose of some functions.

No functional change intended.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

6487c819

net: switchdev: merge switchdev_handle_fdb_{add,del}_to_device · 716a30a9

Vladimir Oltean authored Oct 26, 2021

To reduce code churn, the same patch makes multiple changes, since they
all touch the same lines:

1. The implementations for these two are identical, just with different
   function pointers. Reduce duplications and name the function pointers
   "mod_cb" instead of "add_cb" and "del_cb". Pass the event as argument.

2. Drop the "const" attribute from "orig_dev". If the driver needs to
   check whether orig_dev belongs to itself and then
   call_switchdev_notifiers(orig_dev, SWITCHDEV_FDB_OFFLOADED), it
   can't, because call_switchdev_notifiers takes a non-const struct
   net_device *.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

716a30a9

net: bridge: create a common function for populating switchdev FDB entries · fab9eca8

Vladimir Oltean authored Oct 26, 2021

There are two places where a switchdev FDB entry is constructed, one is
br_switchdev_fdb_notify() and the other is br_fdb_replay(). One uses a
struct initializer, and the other declares the structure as
uninitialized and populates the elements one by one.

One problem when introducing new members of struct
switchdev_notifier_fdb_info is that there is a risk for one of these
functions to run with an uninitialized value.

So centralize the logic of populating such structure into a dedicated
function. Being the primary location where these structures are created,
using an uninitialized variable and populating the members one by one
should be fine, since this one function is supposed to assign values to
all its members.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fab9eca8

net: bridge: move br_fdb_replay inside br_switchdev.c · 5cda5272

Vladimir Oltean authored Oct 26, 2021

br_fdb_replay is only called from switchdev code paths, so it makes
sense to be disabled if switchdev is not enabled in the first place.

As opposed to br_mdb_replay and br_vlan_replay which might be turned off
depending on bridge support for multicast and VLANs, FDB support is
always on. So moving br_mdb_replay and br_vlan_replay inside
br_switchdev.c would mean adding some #ifdef's in br_switchdev.c, so we
keep those where they are.

The reason for the movement is that in future changes there will be some
code reuse between br_switchdev_fdb_notify and br_fdb_replay.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5cda5272