Commits · 28b34f01a73435a754956ebae826e728c03ffa38 · Kirill Smelkov / linux

09 Jul, 2021 7 commits

net: do not reuse skbuff allocated from skbuff_fclone_cache in the skb cache · 28b34f01

Antoine Tenart authored Jul 09, 2021

Some socket buffers allocated in the fclone cache (in __alloc_skb) can
end-up in the following path[1]:

napi_skb_finish
  __kfree_skb_defer
    napi_skb_cache_put

The issue is napi_skb_cache_put is not fclone friendly and will put
those skbuff in the skb cache to be reused later, although this cache
only expects skbuff allocated from skbuff_head_cache. When this happens
the skbuff is eventually freed using the wrong origin cache, and we can
see traces similar to:

[ 1223.947534] cache_from_obj: Wrong slab cache. skbuff_head_cache but object is from skbuff_fclone_cache
[ 1223.948895] WARNING: CPU: 3 PID: 0 at mm/slab.h:442 kmem_cache_free+0x251/0x3e0
[ 1223.950211] Modules linked in:
[ 1223.950680] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.13.0+ #474
[ 1223.951587] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-3.fc34 04/01/2014
[ 1223.953060] RIP: 0010:kmem_cache_free+0x251/0x3e0

Leading sometimes to other memory related issues.

Fix this by using __kfree_skb for fclone skbuff, similar to what is done
the other place __kfree_skb_defer is called.

[1] At least in setups using veth pairs and tunnels. Building a kernel
    with KASAN we can for example see packets allocated in
    sk_stream_alloc_skb hit the above path and later the issue arises
    when the skbuff is reused.

Fixes: 9243adfc ("skbuff: queue NAPI_MERGED_FREE skbs into NAPI cache instead of freeing")
Cc: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

28b34f01

tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path · 358ed624

Talal Ahmad authored Jul 09, 2021

sk_wmem_schedule makes sure that sk_forward_alloc has enough
bytes for charging that is going to be done by sk_mem_charge.

In the transmit zerocopy path, there is sk_mem_charge but there was
no call to sk_wmem_schedule. This change adds that call.

Without this call to sk_wmem_schedule, sk_forward_alloc can go
negetive which is a bug because sk_forward_alloc is a per-socket
space that has been forward charged so this can't be negative.

Fixes: f214f915 ("tcp: enable MSG_ZEROCOPY")
Signed-off-by: Talal Ahmad <talalahmad@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Wei Wang <weiwan@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

358ed624

net: send SYNACK packet with accepted fwmark · 43b90bfa

Alexander Ovechkin authored Jul 09, 2021

commit e05a90ec ("net: reflect mark on tcp syn ack packets")
fixed IPv4 only.

This part is for the IPv6 side.

Fixes: e05a90ec ("net: reflect mark on tcp syn ack packets")
Signed-off-by: Alexander Ovechkin <ovov@yandex-team.ru>
Acked-by: Dmitry Yakunin <zeil@yandex-team.ru>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

43b90bfa

net: ti: fix UAF in tlan_remove_one · 0336f8ff

Pavel Skripkin authored Jul 09, 2021

priv is netdev private data and it cannot be
used after free_netdev() call. Using priv after free_netdev()
can cause UAF bug. Fix it by moving free_netdev() at the end of the
function.

Fixes: 1e0a8b13 ("tlan: cancel work at remove path")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0336f8ff

net: qcom/emac: fix UAF in emac_remove · ad297cd2

Pavel Skripkin authored Jul 09, 2021

adpt is netdev private data and it cannot be
used after free_netdev() call. Using adpt after free_netdev()
can cause UAF bug. Fix it by moving free_netdev() at the end of the
function.

Fixes: 54e19bc7 ("net: qcom/emac: do not use devm on internal phy pdev")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ad297cd2

net: moxa: fix UAF in moxart_mac_probe · c78eaeeb

Pavel Skripkin authored Jul 09, 2021

In case of netdev registration failure the code path will
jump to init_fail label:

init_fail:
	netdev_err(ndev, "init failed\n");
	moxart_mac_free_memory(ndev);
irq_map_fail:
	free_netdev(ndev);
	return ret;

So, there is no need to call free_netdev() before jumping
to error handling path, since it can cause UAF or double-free
bug.

Fixes: 6c821bd9 ("net: Add MOXA ART SoCs ethernet driver")
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c78eaeeb

net: bcmgenet: Ensure all TX/RX queues DMAs are disabled · 2b452550

Florian Fainelli authored Jul 08, 2021

Make sure that we disable each of the TX and RX queues in the TDMA and
RDMA control registers. This is a correctness change to be symmetrical
with the code that enables the TX and RX queues.
Tested-by: Maxime Ripard <maxime@cerno.tech>
Fixes: 1c1008c7 ("net: bcmgenet: add main driver file")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2b452550

08 Jul, 2021 13 commits

Merge branch 'ncsi-phy-link-up' · 5702b81e

David S. Miller authored Jul 08, 2021

Ivan Mikhaylov says:

====================
net/ncsi: Add NCSI Intel OEM command to keep PHY link up

Add NCSI Intel OEM command to keep PHY link up and prevents any channel
resets during the host load on i210. Also includes dummy response handler for
Intel manufacturer id.

Changes from v1:
 1. sparse fixes about casts
 2. put it after ncsi_dev_state_probe_cis instead of
    ncsi_dev_state_probe_channel because sometimes channel is not ready
    after it
 3. inl -> intel
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

5702b81e

net/ncsi: add dummy response handler for Intel boards · 163f5de5

Ivan Mikhaylov authored Jul 08, 2021

Add the dummy response handler for Intel boards to prevent incorrect
handling of OEM commands.
Signed-off-by: Ivan Mikhaylov <i.mikhaylov@yadro.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

163f5de5

net/ncsi: add NCSI Intel OEM command to keep PHY up · abd2fddc

Ivan Mikhaylov authored Jul 08, 2021

This allows to keep PHY link up and prevents any channel resets during
the host load.

It is KEEP_PHY_LINK_UP option(Veto bit) in i210 datasheet which
block PHY reset and power state changes.
Signed-off-by: Ivan Mikhaylov <i.mikhaylov@yadro.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

abd2fddc

net/ncsi: fix restricted cast warning of sparse · 27fa107d

Ivan Mikhaylov authored Jul 08, 2021

Sparse reports:
net/ncsi/ncsi-rsp.c:406:24: warning: cast to restricted __be32
net/ncsi/ncsi-manage.c:732:33: warning: cast to restricted __be32
net/ncsi/ncsi-manage.c:756:25: warning: cast to restricted __be32
net/ncsi/ncsi-manage.c:779:25: warning: cast to restricted __be32
Signed-off-by: Ivan Mikhaylov <i.mikhaylov@yadro.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

27fa107d

net: microchip: sparx5: fix kconfig warning · 96248d6d

Randy Dunlap authored Jul 08, 2021

PHY_SPARX5_SERDES depends on OF so SPARX5_SWITCH should also depend
on OF since 'select' does not follow any dependencies.

WARNING: unmet direct dependencies detected for PHY_SPARX5_SERDES
  Depends on [n]: (ARCH_SPARX5 || COMPILE_TEST [=n]) && OF [=n] && HAS_IOMEM [=y]
  Selected by [y]:
  - SPARX5_SWITCH [=y] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_MICROCHIP [=y] && NET_SWITCHDEV [=y] && HAS_IOMEM [=y]

Fixes: 3cfa11ba ("net: sparx5: add the basic sparx5 driver")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Lars Povlsen <lars.povlsen@microchip.com>
Cc: Steen Hegelund <Steen.Hegelund@microchip.com>
Cc: UNGLinuxDriver@microchip.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

96248d6d

cxgb4: fix IRQ free race during driver unload · 015fe6fd

Shahjada Abul Husain authored Jul 08, 2021

IRQs are requested during driver's ndo_open() and then later
freed up in disable_interrupts() during driver unload.
A race exists where driver can set the CXGB4_FULL_INIT_DONE
flag in ndo_open() after the disable_interrupts() in driver
unload path checks it, and hence misses calling free_irq().

Fix by unregistering netdevice first and sync with driver's
ndo_open(). This ensures disable_interrupts() checks the flag
correctly and frees up the IRQs properly.

Fixes: b37987e8 ("cxgb4: Disable interrupts and napi before unregistering netdev")
Signed-off-by: Shahjada Abul Husain <shahjada@chelsio.com>
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

015fe6fd

mt76: mt7921: continue to probe driver when fw already downloaded · c3426904

Aaron Ma authored Jul 08, 2021

When reboot system, no power cycles, firmware is already downloaded,
return -EIO will break driver as error:
mt7921e: probe of 0000:03:00.0 failed with error -5

Skip firmware download and continue to probe.
Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
Fixes: 1c099ab4 ("mt76: mt7921: add MCU support")
Signed-off-by: David S. Miller <davem@davemloft.net>

c3426904

atl1c: fix Mikrotik 10/25G NIC detection · b9d233ea

Gatis Peisenieks authored Jul 08, 2021

Since Mikrotik 10/25G NIC MDIO op emulation is not 100% reliable,
on rare occasions it can happen that some physical functions of
the NIC do not get initialized due to timeouted early MDIO op.

This changes the atl1c probe on Mikrotik 10/25G NIC not to
depend on MDIO op emulation.
Signed-off-by: Gatis Peisenieks <gatis@mikrotik.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b9d233ea

ptp: Relocate lookup cookie to correct block. · debdd8e3

Jonathan Lemon authored Jul 08, 2021

An earlier commit set the pps_lookup cookie, but the line
was somehow added to the wrong code block.  Correct this.

Fixes: 8602e40f ("ptp: Set lookup cookie when creating a PTP PPS source.")
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Dario Binacchi <dariobin@libero.it>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

debdd8e3

ipv6: tcp: drop silly ICMPv6 packet too big messages · c7bb4b89

Eric Dumazet authored Jul 08, 2021

While TCP stack scales reasonably well, there is still one part that
can be used to DDOS it.

IPv6 Packet too big messages have to lookup/insert a new route,
and if abused by attackers, can easily put hosts under high stress,
with many cpus contending on a spinlock while one is stuck in fib6_run_gc()

ip6_protocol_deliver_rcu()
 icmpv6_rcv()
  icmpv6_notify()
   tcp_v6_err()
    tcp_v6_mtu_reduced()
     inet6_csk_update_pmtu()
      ip6_rt_update_pmtu()
       __ip6_rt_update_pmtu()
        ip6_rt_cache_alloc()
         ip6_dst_alloc()
          dst_alloc()
           ip6_dst_gc()
            fib6_run_gc()
             spin_lock_bh() ...

Some of our servers have been hit by malicious ICMPv6 packets
trying to _increase_ the MTU/MSS of TCP flows.

We believe these ICMPv6 packets are a result of a bug in one ISP stack,
since they were blindly sent back for _every_ (small) packet sent to them.

These packets are for one TCP flow:
09:24:36.266491 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240
09:24:36.266509 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240
09:24:36.316688 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240
09:24:36.316704 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240
09:24:36.608151 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240

TCP stack can filter some silly requests :

1) MTU below IPV6_MIN_MTU can be filtered early in tcp_v6_err()
2) tcp_v6_mtu_reduced() can drop requests trying to increase current MSS.

This tests happen before the IPv6 routing stack is entered, thus
removing the potential contention and route exhaustion.

Note that IPv6 stack was performing these checks, but too late
(ie : after the route has been added, and after the potential
garbage collect war)

v2: fix typo caught by Martin, thanks !
v3: exports tcp_mtu_to_mss(), caught by David, thanks !

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c7bb4b89

skbuff: Fix build with SKB extensions disabled · 9615fe36

Florian Fainelli authored Jul 07, 2021

We will fail to build with CONFIG_SKB_EXTENSIONS disabled after
8550ff8d ("skbuff: Release nfct refcount on napi stolen or re-used
skbs") since there is an unconditionally use of skb_ext_find() without
an appropriate stub. Simply build the code conditionally and properly
guard against both COFNIG_SKB_EXTENSIONS as well as
CONFIG_NET_TC_SKB_EXT being disabled.

Fixes: Fixes: 8550ff8d ("skbuff: Release nfct refcount on napi stolen or re-used skbs")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9615fe36

ipmr: Fix indentation issue · 92c4bed5

Roy, UjjaL authored Jul 07, 2021

Fixed indentation by removing extra spaces.
Signed-off-by: Roy, UjjaL <royujjal@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

92c4bed5

sock: unlock on error in sock_setsockopt() · 271dbc31

Dan Carpenter authored Jul 07, 2021

If copy_from_sockptr() then we need to unlock before returning.

Fixes: d463126e ("net: sock: extend SO_TIMESTAMPING for PHC binding")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

271dbc31

07 Jul, 2021 6 commits

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · d7fba8ff

David S. Miller authored Jul 07, 2021

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Do not refresh timeout in SYN_SENT for syn retransmissions.
   Add selftest for unreplied TCP connection, from Florian Westphal.

2) Fix null dereference from error path with hardware offload
   in nftables.

3) Remove useless nf_ct_gre_keymap_flush() from netns exit path,
   from Vasily Averin.

4) Missing rcu read-lock side in ctnetlink helper info dump,
   also from Vasily.

5) Do not mark RST in the reply direction coming after SYN packet
   for an out-of-sync entry, from Ali Abdallah and Florian Westphal.

6) Add tcp_ignore_invalid_rst sysctl to allow to disable out of
   segment RSTs, from Ali.

7) KCSAN fix for nf_conntrack_all_lock(), from Manfred Spraul.

8) Honor NFTA_LAST_SET in nft_last.

9) Fix incorrect arithmetics when restore last_jiffies in nft_last.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

d7fba8ff

selftests: icmp_redirect: IPv6 PMTU info should be cleared after redirect · 0e02bf5d

Hangbin Liu authored Jul 07, 2021

After redirecting, it's already a new path. So the old PMTU info should
be cleared. The IPv6 test "mtu exception plus redirect" should only
has redirect info without old PMTU.

The IPv4 test can not be changed because of legacy.

Fixes: ec810535 ("selftests: Add redirect tests")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

0e02bf5d

selftests: icmp_redirect: remove from checking for IPv6 route get · 24b671aa

Hangbin Liu authored Jul 07, 2021

If the kernel doesn't enable option CONFIG_IPV6_SUBTREES, the RTA_SRC
info will not be exported to userspace in rt6_fill_node(). And ip cmd will
not print "from ::" to the route output. So remove this check.

Fixes: ec810535 ("selftests: Add redirect tests")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

24b671aa

stmmac: platform: Fix signedness bug in stmmac_probe_config_dt() · eca81f09

YueHaibing authored Jul 07, 2021

The "plat->phy_interface" variable is an enum and in this context GCC
will treat it as an unsigned int so the error handling is never
triggered.

Fixes: b9f0b2f6 ("net: stmmac: platform: fix probe for ACPI devices")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

eca81f09

stmmac: dwmac-loongson: Fix unsigned comparison to zero · 0d472c69

YueHaibing authored Jul 07, 2021

plat->phy_interface is unsigned integer, so the condition
can't be less than zero and the warning will never printed.

Fixes: 30bba69d ("stmmac: pci: Add dwmac support for Loongson")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0d472c69

netfilter: uapi: refer to nfnetlink_conntrack.h, not nf_conntrack_netlink.h · d322957e

Duncan Roe authored Jul 07, 2021

nf_conntrack_netlink.h does not exist, refer to nfnetlink_conntrack.h instead.
Signed-off-by: Duncan Roe <duncan_roe@optusnet.com.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

d322957e

06 Jul, 2021 14 commits

ipv6: fix 'disable_policy' for fwd packets · ccd27f05

Nicolas Dichtel authored Jul 06, 2021

The goal of commit df789fe7 ("ipv6: Provide ipv6 version of
"disable_policy" sysctl") was to have the disable_policy from ipv4
available on ipv6.
However, it's not exactly the same mechanism. On IPv4, all packets coming
from an interface, which has disable_policy set, bypass the policy check.
For ipv6, this is done only for local packets, ie for packets destinated to
an address configured on the incoming interface.

Let's align ipv6 with ipv4 so that the 'disable_policy' sysctl has the same
effect for both protocols.

My first approach was to create a new kind of route cache entries, to be
able to set DST_NOPOLICY without modifying routes. This would have added a
lot of code. Because the local delivery path is already handled, I choose
to focus on the forwarding path to minimize code churn.

Fixes: df789fe7 ("ipv6: Provide ipv6 version of "disable_policy" sysctl")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ccd27f05

octeontx2-pf: Fix assigned error return value that is never used · ad1f3797

Colin Ian King authored Jul 06, 2021

Currently when the call to otx2_mbox_alloc_msg_cgx_mac_addr_update fails
the error return variable rc is being assigned -ENOMEM and does not
return early. rc is then re-assigned and the error case is not handled
correctly. Fix this by returning -ENOMEM rather than assigning rc.

Addresses-Coverity: ("Unused value")
Fixes: 79d2be38 ("octeontx2-pf: offload DMAC filters to CGX/RPM block")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ad1f3797

Merge branch 'bonding-ipsec' · 5ddef2ad

David S. Miller authored Jul 06, 2021

Taehee Yoo says:

====================
net: fix bonding ipsec offload problems

This series fixes some problems related to bonding ipsec offload.

The 1, 5, and 8th patches are to add a missing rcu_read_lock().
The 2nd patch is to add null check code to bond_ipsec_add_sa.
When bonding interface doesn't have an active real interface, the
bond->curr_active_slave pointer is null.
But bond_ipsec_add_sa() uses that pointer without null check.
So that it results in null-ptr-deref.
The 3 and 4th patches are to replace xs->xso.dev with xs->xso.real_dev.
The 6th patch is to disallow to set ipsec offload if a real interface
type is bonding.
The 7th patch is to add struct bond_ipsec to manage SA.
If bond mode is changed, or active real interface is changed, SA should
be removed from old current active real interface then it should be added
to new active real interface.
But it can't, because it doesn't manage SA.
The 9th patch is to fix incorrect return value of bond_ipsec_offload_ok().

v1 -> v2:
 - Add 9th patch.
 - Do not print warning when there is no SA in bond_ipsec_add_sa_all().
 - Add comment for ipsec_lock.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

5ddef2ad