Commits · 1f36a72ae347bc922bddf9382b608afa47a25a28 · Kirill Smelkov / linux

22 May, 2022 1 commit

net: sparx5: switchdev: fix typo in comment · 1f36a72a

Julia Lawall authored May 21, 2022

Spelling mistake (triple letters) in comment.
Detected with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

1f36a72a

21 May, 2022 12 commits

Merge branch 'add-a-bhash2-table-hashed-by-port-address' · aa5334b1

Jakub Kicinski authored May 20, 2022

Joanne Koong says:

====================
Add a bhash2 table hashed by port + address

This patchset proposes adding a bhash2 table that hashes by port and address.
The motivation behind bhash2 is to expedite bind requests in situations where
the port has many sockets in its bhash table entry, which makes checking bind
conflicts costly especially given that we acquire the table entry spinlock
while doing so, which can cause softirq cpu lockups and can prevent new tcp
connections.

We ran into this problem at Meta where the traffic team binds a large number
of IPs to port 443 and the bind() call took a significant amount of time
which led to cpu softirq lockups, which caused packet drops and other failures
on the machine

The patches are as follows:
1/2 - Adds a second bhash table (bhash2) hashed by port and address
2/2 - Adds a test for timing how long an additional bind request takes when
the bhash entry is populated

When experimentally testing this on a local server for ~24k sockets bound to
the port, the results seen were:

ipv4:
before - 0.002317 seconds
with bhash2 - 0.000018 seconds

ipv6:
before - 0.002431 seconds
with bhash2 - 0.000021 seconds
====================

Link: https://lore.kernel.org/r/20220520001834.2247810-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

aa5334b1

selftests: Add test for timing a bind request to a port with a populated bhash entry · 538aaf9b

Joanne Koong authored May 19, 2022

This test populates the bhash table for a given port with
MAX_THREADS * MAX_CONNECTIONS sockets, and then times how long
a bind request on the port takes.

When populating the bhash table, we create the sockets and then bind
the sockets to the same address and port (SO_REUSEADDR and SO_REUSEPORT
are set). When timing how long a bind on the port takes, we bind on a
different address without SO_REUSEPORT set. We do not set SO_REUSEPORT
because we are interested in the case where the bind request does not
go through the tb->fastreuseport path, which is fragile (eg
tb->fastreuseport path does not work if binding with a different uid).

To run the test locally, I did:
* ulimit -n 65535000
* ip addr add 2001:0db8:0:f101::1 dev eth0
* ./bind_bhash_test 443
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

538aaf9b

net: Add a second bind table hashed by port and address · d5a42de8

Joanne Koong authored May 19, 2022

We currently have one tcp bind table (bhash) which hashes by port
number only. In the socket bind path, we check for bind conflicts by
traversing the specified port's inet_bind2_bucket while holding the
bucket's spinlock (see inet_csk_get_port() and inet_csk_bind_conflict()).

In instances where there are tons of sockets hashed to the same port
at different addresses, checking for a bind conflict is time-intensive
and can cause softirq cpu lockups, as well as stops new tcp connections
since __inet_inherit_port() also contests for the spinlock.

This patch proposes adding a second bind table, bhash2, that hashes by
port and ip address. Searching the bhash2 table leads to significantly
faster conflict resolution and less time holding the spinlock.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

d5a42de8

wwan: iosm: use a flexible array rather than allocate short objects · eac67d83

Jakub Kicinski authored May 19, 2022

GCC array-bounds warns that ipc_coredump_get_list() under-allocates
the size of struct iosm_cd_table *cd_table.

This is avoidable - we just need a flexible array. Nothing calls
sizeof() on struct iosm_cd_list or anything that contains it.
Reviewed-by: M Chetan Kumar <m.chetan.kumar@intel.com>
Link: https://lore.kernel.org/r/20220520060013.2309497-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

eac67d83

stcp: Use memset_after() to zero sctp_stream_out_ext · 29849a48

Xiu Jianfeng authored May 19, 2022

Use memset_after() helper to simplify the code, there is no functional
change in this patch.
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Link: https://lore.kernel.org/r/20220519062932.249926-1-xiujianfeng@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

29849a48

net: mscc: fix the alignment in ocelot_port_fdb_del() · f7b5a89c

Alaa Mohamed authored May 20, 2022

align the extack argument of the ocelot_port_fdb_del()
function.
Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com>
Link: https://lore.kernel.org/r/20220520002040.4442-1-eng.alaamohamedsoliman.am@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

f7b5a89c

net: vxlan: Fix kernel coding style · c2e10f53

Alaa Mohamed authored May 20, 2022

The continuation line does not align with the opening bracket
and this patch fix it.
Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com>
Link: https://lore.kernel.org/r/20220520003614.6073-1-eng.alaamohamedsoliman.am@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

c2e10f53

eth: bnxt: make ulp_id unsigned to make GCC 12 happy · dbb2f362

Jakub Kicinski authored May 19, 2022

GCC array bounds checking complains that ulp_id is validated
only against upper bound. Make it unsigned.
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20220520061955.2312968-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

dbb2f362

selftests: fib_nexthops: Make ping timeout configurable · 5feba472

Amit Cohen authored May 19, 2022

Commit 49bb39bd ("selftests: fib_nexthops: Make the test more robust")
increased the timeout of ping commands to 5 seconds, to make the test
more robust. Make the timeout configurable using '-w' argument to allow
user to change it depending on the system that runs the test. Some systems
suffer from slow forwarding performance, so they may need to change the
timeout.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20220519070921.3559701-1-amcohen@nvidia.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

5feba472

net: wwan: t7xx: use GFP_ATOMIC under spin lock in t7xx_cldma_gpd_set_next_ptr() · 9ee152ee

Yang Yingliang authored May 19, 2022

Sometimes t7xx_cldma_gpd_set_next_ptr() is called under spin lock,
so add 'gfp_mask' parameter in t7xx_cldma_gpd_set_next_ptr() to pass
the flag.

Fixes: 39d43904 ("net: wwan: t7xx: Add control DMA interface")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Link: https://lore.kernel.org/r/20220519032108.2996400-1-yangyingliang@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

9ee152ee

net: tulip: fix build with CONFIG_GSC · dc2df00a

Rolf Eike Beer authored May 20, 2022

Fix typo which breaks build for parisc.

Fixes: 3daebfbe ("net: tulip: convert to devres")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Link: https://lore.kernel.org/all/CA+G9fYuCzU5VZ_nc+6NEdBXJdVCH=J2SB1Na1G_NS_0BNdGYtg@mail.gmail.com/Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Link: https://lore.kernel.org/r/4719560.GXAFRqVoOG@eto.sf-tec.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

dc2df00a

net: avoid strange behavior with skb_defer_max == 1 · c09b0cd2

Jakub Kicinski authored May 18, 2022

When user sets skb_defer_max to 1 the kick threshold is 0
(half of 1). If we increment queue length before the check
the kick will never happen, and the skb may get stranded.
This is likely harmless but can be avoided by moving the
increment after the check. This way skb_defer_max == 1
will always kick. Still a silly config to have, but
somehow that feels more correct.

While at it drop a comment which seems to be outdated
or confusing, and wrap the defer_count write with
a WRITE_ONCE() since it's read on the fast path
that avoids taking the lock.
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220518185522.2038683-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

c09b0cd2

20 May, 2022 27 commits

sfc/siena: Remove duplicate check on segments · cc398a34

Martin Habets authored May 19, 2022

Siena only supports software TSO. This means more code can be deleted,
as pointed out by the Smatch static checker warning:
drivers/net/ethernet/sfc/siena/tx.c:184 __efx_siena_enqueue_skb()
warn: duplicate check 'segments' (previous on line 158)

Fixes: 956f2d86 ("sfc/siena: Remove build references to missing functionality")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/kernel-janitors/YoH5tJMnwuGTrn1Z@kili/Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://lore.kernel.org/r/165294463549.23865.4557617334650441347.stgit@palantir17.mph.netSigned-off-by: Jakub Kicinski <kuba@kernel.org>

cc398a34

tcp_ipv6: set the drop_reason in the right place · dc776924

Jakub Kicinski authored May 19, 2022

Looks like the IPv6 version of the patch under Fixes was
a copy/paste of the IPv4 but hit the wrong spot.
It is tcp_v6_rcv() which uses drop_reason as a boolean, and
needs to be protected against reason == 0 before calling free.
tcp_v6_do_rcv() has a pretty straightforward flow.

The resulting warning looks like this:
  WARNING: CPU: 1 PID: 0 at net/core/skbuff.c:775
  Call Trace:
    tcp_v6_rcv (net/ipv6/tcp_ipv6.c:1767)
    ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:438)
    ip6_input_finish (include/linux/rcupdate.h:726)
    ip6_input (include/linux/netfilter.h:307)

Fixes: f8319dfd ("net: tcp: reset 'drop_reason' to NOT_SPCIFIED in tcp_v{4,6}_rcv()")
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/20220520021347.2270207-1-kuba@kernel.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

dc776924

Merge branch 'net-ipa-next' · b6d26144

David S. Miller authored May 20, 2022

Alex Elder says:

====================
net: ipa: a mix of patches

This series includes a mix of things things that are generally
minor.  The first four are sort of unrelated fixes, and summarizing
them here wouldn't be that helpful.

The last three together make it so only the "configuration data" we
need after initialization is saved for later use.  Most such data is
used only during driver initialization.  But endpoint configuration
is needed later, so the last patch saves a copy of that.  Eventually
we'll want to support reconfiguring endpoints at runtime as well,
and this will facilitate that.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b6d26144