Commits · c331ecf1afc1211ce927cc4bd3a978b3655c0854 · Kirill Smelkov / linux

21 Nov, 2020 29 commits

net: hns3: adds debugfs to dump more info of shaping parameters · c331ecf1

Yonglong Liu authored Nov 20, 2020

Adds debugfs to dump new shaping parameters: rate and flag.
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

c331ecf1

net: hns3: add support to utilize the firmware calculated shaping parameters · e364ad30

Yonglong Liu authored Nov 20, 2020

Since the calculation of the driver is fixed, if the number of
queue or clock changed, the calculated result may be inaccurate.

So for compatible and maintainable, add a new flag to tell the
firmware to calculate the shaping parameters with the specified
rate.
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

e364ad30

net: hns3: add support for pf querying new interrupt resources · 3a6863e4

Yufeng Mo authored Nov 20, 2020

For HNAE3_DEVICE_VERSION_V3, a maximum of 1281 interrupt
resources are supported. To utilize these new resources,
extend the corresponding field or variable to 16bit type,
and remove the restriction of NIC client that only use a
maximum of 65 interrupt vectors. In addition, the I/O address
of the extended interrupt resources are different, so an extra
handler is needed.

Currently, the total number of interrupts is the sum of RoCE's
number and RoCE's offset (RoCE is in front of NIC), since
the number of both NIC and RoCE are same. For readability,
rewrite the corresponding field of the command, rename the
RoCE's offset field as the number of NIC interrupts, then
the total number of interrupts is sum of the number of RoCE
and NIC, and replace vport->back with hdev in
hclge_init_roce_base_info() for simplifying the code.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

3a6863e4

net: hns3: add support for mapping device memory · 30ae7f8a

Huazhong Tan authored Nov 20, 2020

For device who has device memory accessed through the PCI BAR4,
IO descriptor push of NIC and direct WQE(Work Queue Element) of
RoCE will use this device memory, so add support for mapping
this device memory, and add this info to the RoCE client whose
new feature needs.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

30ae7f8a

net: hns3: add support for 1280 queues · 9a5ef4aa

Yonglong Liu authored Nov 20, 2020

For DEVICE_VERSION_V1/2, there are total 1024 queues and
queue sets. For DEVICE_VERSION_V3, it increases to 1280,
and can be assigned to one pf， so remove the limitation
of 1024.

To keep compatible with DEVICE_VERSION_V1/2 and old driver
version, the queue number is split into two part:
tqp_num(range 0~1023) and ext_tqp_num(range 1024~1279).
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

9a5ef4aa

Merge branch 'ibmvnic-performance-improvements-and-other-updates' · 16de5970

Jakub Kicinski authored Nov 20, 2020

Thomas Falcon says:

====================
ibmvnic: Performance improvements and other updates

The first three patches utilize a hypervisor call allowing multiple
TX and RX buffer replenishment descriptors to be sent in one operation,
which significantly reduces hypervisor call overhead. The xmit_more
and Byte Queue Limit API's are leveraged to provide this support
for TX descriptors.

The subsequent two patches remove superfluous code and members in
TX completion handling function and TX buffer structure, respectively,
and remove unused routines.

Finally, four patches which ensure that device queue memory is
cache-line aligned, resolving slowdowns observed in PCI traces,
as well as optimize the driver's NAPI polling function and
to RX buffer replenishment are provided by Dwip Banerjee.

This series provides significant performance improvements, allowing
the driver to fully utilize 100Gb NIC's.
====================

Link: https://lore.kernel.org/r/1605748345-32062-1-git-send-email-tlfalcon@linux.ibm.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

16de5970

ibmvnic: Do not replenish RX buffers after every polling loop · 41ed0a00

Dwip N. Banerjee authored Nov 18, 2020

Reduce the amount of time spent replenishing RX buffers by only doing
so once available buffers has fallen under a certain threshold, in this
case half of the total number of buffers, or if the polling loop exits
before the packets processed is less than its budget. Non-exhaustion of
NAPI budget implies lower incoming packet pressure, allowing the leeway
to refill the buffers in preparation for any impending burst.
Signed-off-by: Dwip N. Banerjee <dnbanerg@us.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

41ed0a00

ibmvnic: Use netdev_alloc_skb instead of alloc_skb to replenish RX buffers · e552aa31

Dwip N. Banerjee authored Nov 18, 2020

Take advantage of the additional optimizations in netdev_alloc_skb when
allocating socket buffers to be used for packet reception.
Signed-off-by: Dwip N. Banerjee <dnbanerg@us.ibm.com>
Acked-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

e552aa31

ibmvnic: Correctly re-enable interrupts in NAPI polling routine · ec20f36b

Dwip N. Banerjee authored Nov 18, 2020

If the current NAPI polling loop exits without completing it's
budget, only re-enable interrupts if there are no entries remaining
in the queue and napi_complete_done is successful. If there are entries
remaining on the queue that were missed, restart the polling loop.
Signed-off-by: Dwip N. Banerjee <dnbanerg@us.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ec20f36b

ibmvnic: Ensure that device queue memory is cache-line aligned · 9a87c3fc

Dwip N. Banerjee authored Nov 18, 2020

PCI bus slowdowns were observed on IBM VNIC devices as a result
of partial cache line writes and non-cache aligned full cache line writes.
Ensure that packet data buffers are cache-line aligned to avoid these
slowdowns.
Signed-off-by: Dwip N. Banerjee <dnbanerg@us.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

9a87c3fc

ibmvnic: Remove send_subcrq function · 8ed589f3

Thomas Falcon authored Nov 18, 2020

It is not longer used, so remove it.
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Acked-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

8ed589f3

ibmvnic: Clean up TX code and TX buffer data structure · c62aa373

Thomas Falcon authored Nov 18, 2020

Remove unused and superfluous code and members in
existing TX implementation and data structures.
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

c62aa373

ibmvnic: Introduce xmit_more support using batched subCRQ hcalls · 0d973388

Thomas Falcon authored Nov 18, 2020

Include support for the xmit_more feature utilizing the
H_SEND_SUB_CRQ_INDIRECT hypervisor call which allows the sending
of multiple subordinate Command Response Queue descriptors in one
hypervisor call via a DMA-mapped buffer. This update reduces hypervisor
calls and thus hypervisor call overhead per TX descriptor.
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

0d973388

ibmvnic: Introduce batched RX buffer descriptor transmission · 4f0b6812

Thomas Falcon authored Nov 18, 2020

Utilize the H_SEND_SUB_CRQ_INDIRECT hypervisor call to send
multiple RX buffer descriptors to the device in one hypervisor
call operation. This change will reduce the number of hypervisor
calls and thus hypervisor call overhead needed to transmit
RX buffer descriptors to the device.
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

4f0b6812

ibmvnic: Introduce indirect subordinate Command Response Queue buffer · f019fb63

Thomas Falcon authored Nov 18, 2020

This patch introduces the infrastructure to send batched subordinate
Command Response Queue descriptors, which are used by the ibmvnic
driver to send TX frame and RX buffer descriptors.
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Acked-by: Lijun Pan <ljp@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

f019fb63

Merge branch 'net-ipa-add-a-driver-shutdown-callback' · c9003783

Jakub Kicinski authored Nov 20, 2020

Alex Elder says:

====================
net: ipa: add a driver shutdown callback

The final patch in this series adds a driver shutdown callback for
the IPA driver.  The patches leading up to that address some issues
encountered while ensuring that callback worked as expected:
  - The first just reports a little more information when channels
    or event rings are in unexpected states
  - The second patch recognizes a condition where an as-yet-unused
    channel does not require a reset during teardown
  - The third patch explicitly ignores a certain error condition,
    because it can't be avoided, and is harmless if it occurs
  - The fourth properly handles requests to retry a channel HALT
    request
  - The fifth makes a second attempt to stop modem activity during
    shutdown if it's busy

The shutdown callback is implemented by calling the existing remove
callback function (reporting if that returns an error).
====================

Link: https://lore.kernel.org/r/20201119224929.23819-1-elder@linaro.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

c9003783

net: ipa: add driver shutdown callback · ae1d72f9

Alex Elder authored Nov 19, 2020

A system shutdown can happen at essentially any time, and it's
possible that the IPA driver is busy when a shutdown is underway.
IPA hardware accesses IMEM and SMEM memory regions using an IOMMU,
and at some point during shutdown, needed I/O mappings could become
invalid.  This could be disastrous for any "in flight" IPA activity.

Avoid this by defining a new driver shutdown callback that stops all
IPA activity and cleanly shuts down the driver.  It merely calls the
driver's existing remove callback, reporting the error if it returns
one.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ae1d72f9

net: ipa: retry modem stop if busy · 7c80e838

Alex Elder authored Nov 19, 2020

The IPA driver remove callback, ipa_remove(), calls ipa_modem_stop()
if the setup stage of initialization is complete.  If a concurrent
call to ipa_modem_start() or ipa_modem_stop() has begin but not
completed, ipa_modem_stop() can return an error (-EBUSY).

The next patch adds a driver shutdown callback, which will simply
call ipa_remove().  We really want our shutdown callback to clean
things up.  So add a single retry to the ipa_modem_stop() call in
ipa_remove() after a short (millisecond) delay.  This offers no
guarantee the shutdown will complete successfully, but we'll at
least try a little harder before giving up.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

7c80e838

net: ipa: support retries on generic GSI commands · 11361456

Alex Elder authored Nov 19, 2020

When stopping an AP RX channel, there can be a transient period
while the channel enters STOP_IN_PROC state before reaching the
final STOPPED state.  In that case we make another attempt to stop
the channel.

Similarly, when stopping a modem channel (using a GSI generic
command issued from the AP), it's possible that multiple attempts
will be required before the channel reaches STOPPED state.

Add a field to the GSI structure to record an errno representing the
result code provided when a generic command completes.  If the
result learned in gsi_isr_gp_int1() is RETRY, record -EAGAIN in the
result code, otherwise record 0 for success, or -EIO for any other
result.

If we time out nf gsi_generic_command() waiting for the command to
complete, return -ETIMEDOUT (as before).  Otherwise return the
result stashed by gsi_isr_gp_int1().

Add a loop in gsi_modem_channel_halt() to reissue the HALT command
if the result code indicates -EAGAIN.  Limit this to 10 retries
(after the initial attempt).
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

11361456

net: ipa: ignore CHANNEL_NOT_RUNNING errors · f849afcc

Alex Elder authored Nov 19, 2020

IPA v4.2 has a hardware quirk that requires the AP to allocate GSI
channels for the modem to use.  It is recommended that these modem
channels get stopped (with a HALT generic command) by the AP when
its IPA driver gets removed.

The AP has no way of knowing the current state of a modem channel.
So when the IPA driver issues a HALT command it's possible the
channel is not running, and in that case we get an error indication.
This error simply means we didn't need to stop the channel, so we
can ignore it.

This patch adds an explanation for this situation, and arranges for
this condition to *not* report an error message.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

f849afcc

net: ipa: don't reset an ALLOCATED channel · 5d28913d

Alex Elder authored Nov 19, 2020

If the rmnet_ipa0 network device has not been opened at the time
we remove or shut down the IPA driver, its underlying TX and RX
GSI channels will not have been started, and they will still be
in ALLOCATED state.

The RESET command on a channel is meant to return a channel to
ALLOCATED state after it's been stopped.  But if it was never
started, its state will still be ALLOCATED, the RESET command
is not required.

Quietly skip doing the reset without printing an error message if a
channel is already in ALLOCATED state when we request it be reset.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

5d28913d

net: ipa: print channel/event ring number on error · f8d3bdd5

Alex Elder authored Nov 19, 2020

When a GSI command is used to change the state of a channel or event
ring we check the state before and after the command to ensure it is
as expected.  If not, we print an error message, but it does not
include the channel or event ring id, and it easily can.  Add the
channel or event ring id to these error messages.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

f8d3bdd5

Merge branch 'net-ipa-platform-specific-clock-and-interconnect-rates' · 0ee6de26

Jakub Kicinski authored Nov 20, 2020

Alex Elder says:

====================
net: ipa: platform-specific clock and interconnect rates

This series changes the way the IPA core clock rate and the
bandwidth parameters for interconnects are specified.  Previously
these were specified with hard-wired constants, with the same values
used for the SDM845 and SC7180 platforms.  Now these parameters are
recorded in platform-specific configuration data.

For the SC7180 this means we use an all-new core clock rate and
interconnect parameters.

Additionally, while developing this I learned that the average
bandwidth setting for two of the interconnects is ignored (on both
platforms).  Zero is now used explicitly as that unused bandwidth
value.  This means the SDM845 bandwidth settings are also changed
by this series.
====================

Link: https://lore.kernel.org/r/20201119224041.16066-1-elder@linaro.orgSigned-off-by: Jakub Kicinski <kuba@kernel.org>

0ee6de26

net: ipa: use config data for clocking · 91d02f95

Alex Elder authored Nov 19, 2020

Stop assuming a fixed IPA core clock rate and interconnect
bandwidths.  Use the configuration data defined for these
things instead.  Get rid of the previously-used constants.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

91d02f95

net: ipa: populate clock and interconnect data · f08c9922

Alex Elder authored Nov 19, 2020

Populate the core clock rate and interconnect average and peak
bandwidth data for SDM845 and SC7180 in their configuration data
files.  At this point we still don't *use* this data.

Note that SC7180 actually defines a new core clock rate (100 MHz
instead of 75 MHz) and new interconnect bandwidth values.  They
will be activated in the next commit, which uses the configured
values rather than the fixed constants.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

f08c9922

net: ipa: define clock and interconnect data · dfccb8b1

Alex Elder authored Nov 19, 2020

Define a new type of configuration data, used to initialize the
IPA core clock and interconnects.  This is the first of three
patches, and defines the data types and interface but doesn't
yet use them.

Switch the return value if there is no matching configuration data
to ENODEV instead of ENOTSUPP (to avoid using the nonstandard errno).
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dfccb8b1

mdio_bus: suppress err message for reset gpio EPROBE_DEFER · 0a12ad59

Grygorii Strashko authored Nov 19, 2020

The mdio_bus may have dependencies from GPIO controller and so got
deferred. Now it will print error message every time -EPROBE_DEFER is
returned which from:
__mdiobus_register()
 |-devm_gpiod_get_optional()
without actually identifying error code.

"mdio_bus 4a101000.mdio: mii_bus 4a101000.mdio couldn't get reset GPIO"

Hence, suppress error message for devm_gpiod_get_optional() returning
-EPROBE_DEFER case by using dev_err_probe().
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Link: https://lore.kernel.org/r/20201119203446.20857-1-grygorii.strashko@ti.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

0a12ad59

r8169: use dev_err_probe in rtl_get_ether_clk · bf7b0bf6

Heiner Kallweit authored Nov 19, 2020

Tiny improvement, let dev_err_probe() deal with EPROBE_DEFER.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/b0c4ebcf-2047-e933-b890-8a20e4bdb19f@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

bf7b0bf6

r8169: reduce number of workaround doorbell rings · 94d8a98e

Heiner Kallweit authored Nov 19, 2020

Some chip versions have a hw bug resulting in lost door bell rings.
To work around this the doorbell is also rung whenever we still have
tx descriptors in flight after having cleaned up tx descriptors.
These PCI(e) writes come at a cost, therefore let's reduce the number
of extra doorbell rings.
If skb is NULL then this means:
- last cleaned-up descriptor belongs to a skb with at least one fragment
  and last fragment isn't marked as sent yet
- hw is in progress sending the skb, therefore no extra doorbell ring
  is needed for this skb
- once last fragment is marked as transmitted hw will trigger
  a tx done interrupt and we come here again (with skb != NULL)
  and ring the doorbell if needed
Therefore skip the workaround doorbell ring if skb is NULL.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/0a15a83c-aecf-ab51-8071-b29d9dcd529a@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

94d8a98e

20 Nov, 2020 11 commits

Merge branch 'mptcp-more-miscellaneous-mptcp-fixes' · 9e8ac63f

Jakub Kicinski authored Nov 20, 2020

Mat Martineau says:

====================
mptcp: More miscellaneous MPTCP fixes

Here's another batch of fixup and enhancement patches that we have
collected in the MPTCP tree.

Patch 1 removes an unnecessary flag and related code.

Patch 2 fixes a bug encountered when closing fallback sockets.

Patches 3 and 4 choose a better transmit subflow, with a self test.

Patch 5 adjusts tracking of unaccepted subflows

Patches 6-8 improve handling of long ADD_ADDR options, with a test.

Patch 9 more reliably tracks the MPTCP-level window shared with peers.

Patch 10 sends MPTCP-level acknowledgements more aggressively, so the
peer can send more data without extra delay.
====================

Link: https://lore.kernel.org/r/20201119194603.103158-1-mathew.j.martineau@linux.intel.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

9e8ac63f

mptcp: refine MPTCP-level ack scheduling · ea4ca586

Paolo Abeni authored Nov 19, 2020

Send timely MPTCP-level ack is somewhat difficult when
the insertion into the msk receive level is performed
by the worker.

It needs TCP-level dup-ack to notify the MPTCP-level
ack_seq increase, as both the TCP-level ack seq and the
rcv window are unchanged.

We can actually avoid processing incoming data with the
worker, and let the subflow or recevmsg() send ack as needed.

When recvmsg() moves the skbs inside the msk receive queue,
the msk space is still unchanged, so tcp_cleanup_rbuf() could
end-up skipping TCP-level ack generation. Anyway, when
__mptcp_move_skbs() is invoked, a known amount of bytes is
going to be consumed soon: we update rcv wnd computation taking
them in account.

Additionally we need to explicitly trigger tcp_cleanup_rbuf()
when recvmsg() consumes a significant amount of the receive buffer.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ea4ca586

mptcp: track window announced to peer · fa3fe2b1

Florian Westphal authored Nov 19, 2020

OoO handling attempts to detect when packet is out-of-window by testing
current ack sequence and remaining space vs. sequence number.

This doesn't work reliably. Store the highest allowed sequence number
that we've announced and use it to detect oow packets.

Do this when mptcp options get written to the packet (wire format).
For this to work we need to move the write_options call until after
stack selected a new tcp window.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

fa3fe2b1

selftests: mptcp: add ADD_ADDR IPv6 test cases · 523514ed

Geliang Tang authored Nov 19, 2020

This patch added IPv6 support for do_transfer, and the test cases for
ADD_ADDR IPv6.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

523514ed

mptcp: send out dedicated ADD_ADDR packet · 84dfe367

Geliang Tang authored Nov 19, 2020

When ADD_ADDR suboption includes an IPv6 address, the size is 28 octets.
It will not fit when other MPTCP suboptions are included in this packet,
e.g. DSS. So here we send out an ADD_ADDR dedicated packet to carry only
ADD_ADDR suboption, no other MPTCP suboptions.

In mptcp_pm_announce_addr, we check whether this is an IPv6 ADD_ADDR.
If it is, we set the flag MPTCP_ADD_ADDR_IPV6 to true. Then we call
mptcp_pm_nl_add_addr_send_ack to sent out a new pure ACK packet.

In mptcp_established_options_add_addr, we check whether this is a pure
ACK packet for ADD_ADDR. If it is, we drop all other MPTCP suboptions
in this packet, only put ADD_ADDR suboption in it.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

84dfe367

mptcp: change add_addr_signal type · d91d322a

Geliang Tang authored Nov 19, 2020

This patch changed the 'add_addr_signal' type from bool to char, so that
we could encode the addr type there.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

d91d322a

mptcp: keep unaccepted MPC subflow into join list · 0397c6d8

Paolo Abeni authored Nov 19, 2020

This will simplify all operation dealing with subflows
before accept time (e.g. data fin processing, add_addr).

The join list is already flushed by mptcp_stream_accept()
before returning the newly created msk to the user space.

This also fixes an potential bug present into the old code:
conn_list was manipulated without helding the msk lock
in mptcp_stream_accept().
Tested-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

0397c6d8

selftests: mptcp: add link failure test case · 8b819a84

Florian Westphal authored Nov 19, 2020

Add a test case where a link fails with multiple subflows.
The expectation is that MPTCP will transmit any data that
could not be delivered via the failed link on another subflow.
Co-developed-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

8b819a84

mptcp: skip to next candidate if subflow has unacked data · 860975c6

Florian Westphal authored Nov 19, 2020

In case a subflow path is blocked, MPTCP-level retransmit may not take
place anymore because such subflow is likely to have unacked data left
in its write queue.

Ignore subflows that have experienced loss and test next candidate.

Fixes: 3b1d6210 ("mptcp: implement and use MPTCP-level retransmission")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

860975c6

mptcp: fix state tracking for fallback socket · 26aa2314

Paolo Abeni authored Nov 19, 2020

We need to cope with some more state transition for
fallback sockets, or could still end-up moving to TCP_CLOSE
too early and avoid spooling some pending data

Fixes: e16163b6 ("mptcp: refactor shutdown and close")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

26aa2314

mptcp: drop WORKER_RUNNING status bit · b2771d24

Paolo Abeni authored Nov 19, 2020

Only mptcp_close() can actually cancel the workqueue,
no need to add and use this flag.
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

b2771d24