Commits · 069254662b657bd602fc9fe97efa4ebc3151df46 · Kirill Smelkov / linux

15 Dec, 2020 40 commits

mlxsw: reg: Add Router LPM Cache Enable Register · 06925466

Jiri Pirko authored Dec 14, 2020

The RLPMCE allows disabling the LPM cache. Can be changed on the fly.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

06925466

mlxsw: reg: Add Router LPM Cache ML Delete Register · edb47f3d

Jiri Pirko authored Dec 14, 2020

The RLCMLD register is used to bulk delete the XLT-LPM cache ML entries.
This can be used by SW when L is increased or decreased, thus need to
remove entries with old ML values.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

edb47f3d

mlxsw: spectrum_router_xm: Implement L-value tracking for M-index · 54ff9dbb

Jiri Pirko authored Dec 14, 2020

There is a table that assigns L-value per M-index. The L is always the
biggest from the currently inserted prefixes. Setup a hashtable to track
the M-index information and the prefixes that are related to it. Ensure
the L-value is always correctly set.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

54ff9dbb

mlxsw: reg: Add XM Router M Table Register · e35e8046

Jiri Pirko authored Dec 14, 2020

The XRMT configures the M-Table for the XLT-LPM.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

e35e8046

mlxsw: spectrum_router: Introduce per-ASIC XM initialization · e0bc244d

Jiri Pirko authored Dec 14, 2020

During the router init flow, call into XM code and initialize couple of
items needed for XM functionality:

1) Query the capabilities and sizes. Check the XM device id.
2) Initialize the M-value. Note that currently the M-value is set fixed
   to 16 for IPv4. In future this may change to better cover the actual
   inserted routes.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

e0bc244d

mlxsw: reg: Add XM Lookup Table Query Register · ec54677e

Jiri Pirko authored Dec 14, 2020

The XLTQ is used to query HW for XM-related info.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ec54677e

mlxsw: reg: Add Router XLT M select Register · 087489dc

Jiri Pirko authored Dec 14, 2020

The RXLTM configures and selects the M for the XM lookups.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

087489dc

mlxsw: Ignore ports that are connected to eXtended mezanine · 50779c33

Jiri Pirko authored Dec 14, 2020

Use the info stored in the bus_info struct about the eXtended mezanine
connected ports and don't expose them.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

50779c33

mlxsw: pci: Obtain info about ports used by eXtended mezanine · 2ea3f4c7

Jiri Pirko authored Dec 14, 2020

The output of boardinfo command was extended to contain information
about XM. Indicates if is present and in case it is, tells which
localports are used for the connection. So parse this info and store it
in bus_info passed up to the driver.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

2ea3f4c7

mlxsw: spectrum_router: Introduce XM implementation of router low-level ops · ff462103

Jiri Pirko authored Dec 14, 2020

In order to offload entries to XM, implement a set of low-level
functions to work with LPM trees in XM and also to pack and write
FIB entries into XM.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ff462103

mlxsw: reg: Add Router XLT Enable Register · 6100fbf1

Jiri Pirko authored Dec 14, 2020

The RXLTE enables XLT (eXtended Lookup Table) LPM lookups if a capable
XM is present on the system.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

6100fbf1

mlxsw: reg: Add XM Direct Register · be6ba3b6

Jiri Pirko authored Dec 14, 2020

The XMDR allows direct access to the XM device via the switch.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

be6ba3b6

Merge branch 'bnxt_en-improve-firmware-flashing' · 22f07b86

Jakub Kicinski authored Dec 14, 2020

Michael Chan says:

====================
bnxt_en: Improve firmware flashing.

This patchset improves firmware flashing in 2 ways:

- If firmware returns NO_SPACE error during flashing, the driver will
create the UPDATE directory with more staging area and retry.
- Instead of allocating a big DMA buffer for the entire contents of
the firmware package size, fallback to a smaller buffer to DMA the
contents in multiple DMA operations.
====================

Link: https://lore.kernel.org/r/1607860306-17244-1-git-send-email-michael.chan@broadcom.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

22f07b86

bnxt_en: Enable batch mode when using HWRM_NVM_MODIFY to flash packages. · a86b313e

Michael Chan authored Dec 13, 2020

The current scheme allocates a DMA buffer as big as the requested
firmware package file and DMAs the contents to firmware in one
operation.  The buffer size can be several hundred kilo bytes and
the driver may not be able to allocate the memory.  This will cause
firmware upgrade to fail.

Improve the scheme by using smaller DMA blocks and calling firmware to
DMA each block in a batch mode.  Older firmware can cause excessive
NVRAM erases if the block size is too small so we try to allocate a
256K buffer to begin with and size it down successively if we cannot
allocate the memory.
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

a86b313e

bnxt_en: Retry installing FW package under NO_SPACE error condition. · 1432c3f6

Pavan Chebbi authored Dec 13, 2020

In bnxt_flash_package_from_fw_obj(), if firmware returns the NO_SPACE
error, call __bnxt_flash_nvram() to create the UPDATE directory and
then loop back and retry one more time.

Since the first try may fail, we use the silent version to send the
firmware commands.
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

1432c3f6

bnxt_en: Restructure bnxt_flash_package_from_fw_obj() to execute in a loop. · 2e5fb428

Pavan Chebbi authored Dec 13, 2020

On NICs with a smaller NVRAM, FW installation may fail after multiple
updates due to fragmentation.  The driver can retry when FW returns
a special error code.  To faciliate the retry, we restructure the
logic that performs the flashing in a loop.  The actual retry logic
will be added in the next patch.
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

2e5fb428

bnxt_en: Rearrange the logic in bnxt_flash_package_from_fw_obj(). · a9094ba6

Michael Chan authored Dec 13, 2020

This function will be modified in the next patch to retry flashing
the firmware in a loop.  To facilate that, we rearrange the code so
that the steps that only need to be done once before the loop will be
moved to the top of the function.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

a9094ba6

bnxt_en: Refactor bnxt_flash_nvram. · 93ff3435

Pavan Chebbi authored Dec 13, 2020

Refactor bnxt_flash_nvram() into __bnxt_flash_nvram() that takes an
additional dir_item_len parameter.  The new function will be used
in subsequent patches with the dir_item_len parameter set to create
the UPDATE directory during flashing.
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

93ff3435

MAINTAINERS: add mvpp2 driver entry · 2aa899eb

Marcin Wojtas authored Dec 11, 2020

Since its creation Marvell NIC driver for Armada 375/7k8k and
CN913x SoC families mvpp2 has been lacking an entry in MAINTAINERS,
which sometimes lead to unhandled bugs that persisted
across several kernel releases.
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20201211165114.26290-1-mw@semihalf.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

2aa899eb

net: drop bogus skb with CHECKSUM_PARTIAL and offset beyond end of trimmed packet · 54970a2f

Vasily Averin authored Dec 14, 2020

syzbot reproduces BUG_ON in skb_checksum_help():
tun creates (bogus) skb with huge partial-checksummed area and
small ip packet inside. Then ip_rcv trims the skb based on size
of internal ip packet, after that csum offset points beyond of
trimmed skb. Then checksum_tg() called via netfilter hook
triggers BUG_ON:

        offset = skb_checksum_start_offset(skb);
        BUG_ON(offset >= skb_headlen(skb));

To work around the problem this patch forces pskb_trim_rcsum_slow()
to return -EINVAL in described scenario. It allows its callers to
drop such kind of packets.

Link: https://syzkaller.appspot.com/bug?id=b419a5ca95062664fe1a60b764621eb4526e2cd0
Reported-by: syzbot+7010af67ced6105e5ab6@syzkaller.appspotmail.com
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/1b2494af-2c56-8ee2-7bc0-923fcad1cdf8@virtuozzo.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

54970a2f

inet_ecn: Use csum16_add() helper for IP_ECN_set_* helpers · 0780b414

Toke Høiland-Jørgensen authored Dec 11, 2020

Jakub pointed out that the IP_ECN_set* helpers basically open-code
csum16_add(), so let's switch them over to using the helper instead.

v2:
- Use __be16 for check_add stack variable in IP_ECN_set_ce() (kbot)
v3:
- Turns out we need __force casts to do arithmetic on __be16 types
Reported-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Jonathan Morton <chromatix99@gmail.com>
Tested-by: Pete Heist <pete@heistp.net>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20201211142638.154780-1-toke@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

0780b414

net: bridge: Fix a warning when del bridge sysfs · 989a1db0

Wang Hai authored Dec 11, 2020

I got a warining report:

br_sysfs_addbr: can't create group bridge4/bridge
------------[ cut here ]------------
sysfs group 'bridge' not found for kobject 'bridge4'
WARNING: CPU: 2 PID: 9004 at fs/sysfs/group.c:279 sysfs_remove_group fs/sysfs/group.c:279 [inline]
WARNING: CPU: 2 PID: 9004 at fs/sysfs/group.c:279 sysfs_remove_group+0x153/0x1b0 fs/sysfs/group.c:270
Modules linked in: iptable_nat
...
Call Trace:
  br_dev_delete+0x112/0x190 net/bridge/br_if.c:384
  br_dev_newlink net/bridge/br_netlink.c:1381 [inline]
  br_dev_newlink+0xdb/0x100 net/bridge/br_netlink.c:1362
  __rtnl_newlink+0xe11/0x13f0 net/core/rtnetlink.c:3441
  rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3500
  rtnetlink_rcv_msg+0x385/0x980 net/core/rtnetlink.c:5562
  netlink_rcv_skb+0x134/0x3d0 net/netlink/af_netlink.c:2494
  netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
  netlink_unicast+0x4a0/0x6a0 net/netlink/af_netlink.c:1330
  netlink_sendmsg+0x793/0xc80 net/netlink/af_netlink.c:1919
  sock_sendmsg_nosec net/socket.c:651 [inline]
  sock_sendmsg+0x139/0x170 net/socket.c:671
  ____sys_sendmsg+0x658/0x7d0 net/socket.c:2353
  ___sys_sendmsg+0xf8/0x170 net/socket.c:2407
  __sys_sendmsg+0xd3/0x190 net/socket.c:2440
  do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

In br_device_event(), if the bridge sysfs fails to be added,
br_device_event() should return error. This can prevent warining
when removing bridge sysfs that do not exist.

Fixes: bb900b27 ("bridge: allow creating bridge devices with netlink")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Tested-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Link: https://lore.kernel.org/r/20201211122921.40386-1-wanghai38@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

989a1db0

ice, xsk: Move Rx allocation out of while-loop · 5bb0c4b5

Björn Töpel authored Dec 11, 2020

Instead doing the check for allocation in each loop, move it outside
the while loop and do it every NAPI loop.

This change boosts the xdpsock rxdrop scenario with 15% more
packets-per-second.
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/r/20201211085410.59350-1-bjorn.topel@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

5bb0c4b5

net: mtk_eth: simplify the mediatek code return expression · bb7eae6d

Zheng Yongjun authored Dec 11, 2020

Simplify the return expression at mtk_eth_path.c file, simplify this all.
Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
Link: https://lore.kernel.org/r/20201211083801.1632-1-zhengyongjun3@huawei.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

bb7eae6d

Merge branch 'add-devlink-and-devlink-health-reporters-to' · 8718d60e

Jakub Kicinski authored Dec 14, 2020

George Cherian says:

====================
Add devlink and devlink health reporters to octeontx2

Add basic devlink and devlink health reporters.
Devlink health reporters are added for NPA block.

Address Jakub's comment to add devlink support for error reporting.
https://www.spinics.net/lists/netdev/msg670712.html

For now, I have dropped the NIX block health reporters.
This series attempts to add health reporters only for the NPA block.
As per Jakub's suggestion separate reporters per event is used and also
got rid of the counters.
====================

Link: https://lore.kernel.org/r/20201211062526.2302643-1-george.cherian@marvell.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

8718d60e

docs: octeontx2: Add Documentation for NPA health reporters · 80b94148

George Cherian authored Dec 11, 2020

Add Documentation for devlink health reporters for NPA block.
Signed-off-by: George Cherian <george.cherian@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

80b94148

octeontx2-af: Add devlink health reporters for NPA · f1168d1e

George Cherian authored Dec 11, 2020

Add health reporters for RVU NPA block.
NPA Health reporters handle following HW event groups
 - GENERAL events
 - ERROR events
 - RAS events
 - RVU event

Output:
 #devlink health
 pci/0002:01:00.0:
   reporter hw_npa_intr
     state healthy error 0 recover 0 grace_period 0 auto_recover true
 auto_dump true
   reporter hw_npa_gen
     state healthy error 0 recover 0 grace_period 0 auto_recover true
 auto_dump true
   reporter hw_npa_err
     state healthy error 0 recover 0 grace_period 0 auto_recover true
 auto_dump true
   reporter hw_npa_ras
     state healthy error 0 recover 0 grace_period 0 auto_recover true
 auto_dump true

 #devlink health dump show  pci/0002:01:00.0 reporter hw_npa_err
 NPA_AF_ERR:
        NPA Error Interrupt Reg : 4096
        AQ Doorbell Error
 #devlink health dump show  pci/0002:01:00.0 reporter hw_npa_ras
 NPA_AF_RVU_RAS:
        NPA RAS Interrupt Reg : 0

 Each reporter dump shows the Register value and the description of the
cause.
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: George Cherian <george.cherian@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

f1168d1e

octeontx2-af: Add devlink suppoort to af driver · fae06da4

George Cherian authored Dec 11, 2020

Add devlink support to AF driver. Basic devlink support is added.
Currently info_get is the only supported devlink ops.

devlink ouptput looks like this
 # devlink dev
 pci/0002:01:00.0
 # devlink dev info
 pci/0002:01:00.0:
  driver octeontx2-af
 #
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: George Cherian <george.cherian@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

fae06da4

selftests: test_vxlan_under_vrf: mute unnecessary error message · 0e12c027

Po-Hsu Lin authored Dec 11, 2020

The cleanup function in this script that tries to delete hv-1 / hv-2
vm-1 / vm-2 netns will generate some uncessary error messages:

Cannot remove namespace file "/run/netns/hv-2": No such file or directory
Cannot remove namespace file "/run/netns/vm-1": No such file or directory
Cannot remove namespace file "/run/netns/vm-2": No such file or directory

Redirect it to /dev/null like other commands in the cleanup function
to reduce confusion.
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Link: https://lore.kernel.org/r/20201211042420.16411-1-po-hsu.lin@canonical.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

0e12c027

net: Limit logical shift left of TCP probe0 timeout · 6d4634d1

Cambda Zhu authored Dec 08, 2020

For each TCP zero window probe, the icsk_backoff is increased by one and
its max value is tcp_retries2. If tcp_retries2 is greater than 63, the
probe0 timeout shift may exceed its max bits. On x86_64/ARMv8/MIPS, the
shift count would be masked to range 0 to 63. And on ARMv7 the result is
zero. If the shift count is masked, only several probes will be sent
with timeout shorter than TCP_RTO_MAX. But if the timeout is zero, it
needs tcp_retries2 times probes to end this false timeout. Besides,
bitwise shift greater than or equal to the width is an undefined
behavior.

This patch adds a limit to the backoff. The max value of max_when is
TCP_RTO_MAX and the min value of timeout base is TCP_RTO_MIN. The limit
is the backoff from TCP_RTO_MIN to TCP_RTO_MAX.
Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
Link: https://lore.kernel.org/r/20201208091910.37618-1-cambda@linux.alibaba.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

6d4634d1

Merge branch 'mptcp-another-set-of-miscellaneous-mptcp-fixes' · ebf32282

Jakub Kicinski authored Dec 14, 2020

Mat Martineau says:

====================
mptcp: Another set of miscellaneous MPTCP fixes

This is another collection of MPTCP fixes and enhancements that we have
tested in the MPTCP tree:

Patch 1 cleans up cgroup attachment for in-kernel subflow sockets.

Patches 2 and 3 make sure that deletion of advertised addresses by an
MPTCP path manager when flushing all addresses behaves similarly to the
remove-single-address operation, and adds related tests.

Patches 4 and 8 do some minor cleanup.

Patches 5-7 add MPTCP_FASTCLOSE functionality. Note that patch 6 adds MPTCP
option parsing to tcp_reset().

Patch 9 optimizes skb size for outgoing MPTCP packets.
====================

Link: https://lore.kernel.org/r/20201210222506.222251-1-mathew.j.martineau@linux.intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

ebf32282

mptcp: let MPTCP create max size skbs · 15e6ca97

Paolo Abeni authored Dec 10, 2020

Currently the xmit path of the MPTCP protocol creates smaller-
than-max-size skbs, which is suboptimal for the performances.

There are a few things to improve:
- when coalescing to an existing skb, must clear the PUSH flag
- tcp_build_frag() expect the available space as an argument.
  When coalescing is enable MPTCP already subtracted the
  to-be-coalesced skb len. We must increment said argument
  accordingly.

Before:
./use_mptcp.sh netperf -H 127.0.0.1 -t TCP_STREAM
[...]
131072  16384  16384    30.00    24414.86

After:
./use_mptcp.sh netperf -H 127.0.0.1 -t TCP_STREAM
[...]
131072  16384  16384    30.05    28357.69
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

15e6ca97

mptcp: pm: simplify select_local_address() · 1bc7327b

Paolo Abeni authored Dec 10, 2020

There is no need to unconditionally acquire the join list
lock, we can simply splice the join list into the subflow
list and traverse only the latter.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

1bc7327b

mptcp: parse and act on incoming FASTCLOSE option · 50c504a2

Florian Westphal authored Dec 10, 2020

parse the MPTCP FASTCLOSE subtype.

If provided key matches the local one, schedule the work queue to close
(with tcp reset) all subflows.

The MPTCP socket moves to closed state immediately.
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

50c504a2

tcp: parse mptcp options contained in reset packets · 049fe386

Florian Westphal authored Dec 10, 2020

Because TCP-level resets only affect the subflow, there is a MPTCP
option to indicate that the MPTCP-level connection should be closed
immediately without a mptcp-level fin exchange.

This is the 'MPTCP fast close option'.  It can be carried on ack
segments or TCP resets.  In the latter case, its needed to parse mptcp
options also for reset packets so that MPTCP can act accordingly.

Next patch will add receive side fastclose support in MPTCP.
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

049fe386

mptcp: hold mptcp socket before calling tcp_done · ab82e996

Florian Westphal authored Dec 10, 2020

When processing options from tcp reset path its possible that
tcp_done(ssk) drops the last reference on the mptcp socket which
results in use-after-free.
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ab82e996

mptcp: use MPTCPOPT_HMAC_LEN macro · ba34c3de

Geliang Tang authored Dec 10, 2020

Use the macro MPTCPOPT_HMAC_LEN instead of a constant in struct
mptcp_options_received.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ba34c3de

selftests: mptcp: add the flush addrs testcase · 6fe4ccdc

Geliang Tang authored Dec 10, 2020

This patch added the flush addrs testcase. In do_transfer, if the number
of removing addresses is less than 8, use the del addr command to remove
the addresses one by one. If the number is more than 8, use the flush addrs
command to remove the addresses.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

6fe4ccdc

mptcp: remove address when netlink flushes addrs · 141694df

Geliang Tang authored Dec 10, 2020

When the PM netlink flushes the addresses, invoke the remove address
function mptcp_nl_remove_subflow_and_signal_addr to remove the addresses
and the subflows. Since this function should not be invoked under lock,
move __flush_addrs out of the pernet->lock.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

141694df

mptcp: attach subflow socket to parent cgroup · 3764b0c5

Nicolas Rybowski authored Dec 10, 2020

It has been observed that the kernel sockets created for the subflows
(except the first one) are not in the same cgroup as their parents.
That's because the additional subflows are created by kernel workers.

This is a problem with eBPF programs attached to the parent's
cgroup won't be executed for the children. But also with any other features
of CGroup linked to a sk.

This patch fixes this behaviour.

As the subflow sockets are created by the kernel, we can't use
'mem_cgroup_sk_alloc' because of the current context being the one of the
kworker. This is why we have to do low level memcg manipulation, if
required.
Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Nicolas Rybowski <nicolas.rybowski@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

3764b0c5