Commits · 61a57e51e46e8eb7df8a3acff2e6da279f2161a3 · Kirill Smelkov / linux

09 Dec, 2020 1 commit

ath11k: fix rmmod failure if qmi sequence fails · 61a57e51

Anilkumar Kolli authored Dec 08, 2020

QMI sequence fails if caldata file is not available.
It is observed that 'rmmod ath11k' fails if qmi message fails.
With this patch rmmod/insmod is working.

Logs:
Direct firmware load for IPQ8074/caldata.bin failed with error -2
Falling back to user helper
qmi failed to load CAL: IPQ8074/caldata.bin
qmi failed to load board data file:-11

Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.4.0.1-00009-QCAHKSWPL_SILICONZ-1
Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01699-QCAHSTSWPLZ_V2_TO_X86-1
Signed-off-by: Anilkumar Kolli <akolli@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1606916215-24643-1-git-send-email-akolli@codeaurora.org

61a57e51

08 Dec, 2020 2 commits

carl9170: remove trailing semicolon in macro definition · e65e8b60

Tom Rix authored Nov 27, 2020

The macro use will already have a semicolon.
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20201127175531.2754461-1-trix@redhat.com

e65e8b60

ath11k: pci: add MODULE_FIRMWARE macros · 3dbd7fe7

Devin Bayer authored Dec 07, 2020

I am trying to get the ath11k driver to work with VyOS and during the
build it tries to discover the firmware blobs which drivers require.

This doesn't work with ath11k because it doesn't use the MODULE_FIRMWARE
macro. This patch fixes that.
Signed-off-by: Devin Bayer <dev@doubly.so>
[kvalo@codeaurora.org: cleanup commit log, move to pci.c]
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20201202182705.dhkml4nb4rf2vwav@orac

3dbd7fe7

07 Dec, 2020 12 commits

ath9k: remove trailing semicolon in macro definition · 5a5b820d

Tom Rix authored Dec 07, 2020

The macro use will already have a semicolon.
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20201127175336.2752730-1-trix@redhat.com

5a5b820d

ath11k: Ignore resetting peer auth flag in peer assoc cmd · 1daf58b2

Seevalamuthu Mariappan authored Dec 07, 2020

Incase of hardware encryption, WMI_PEER_AUTH flag will be set by firmware
during install key. Since install key wont be done for software encryption
mode, firmware will not set this flag. Due to this, seeing traffic failure
in software encryption. Hence, avoid resetting peer auth flag if hardware
encryption disabled.

Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.4.0.1-01421-QCAHKSWPL_SILICONZ-1
Signed-off-by: Seevalamuthu Mariappan <seevalam@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1606369414-25211-1-git-send-email-seevalam@codeaurora.org

1daf58b2

ath11k: add 64bit check before reading msi high addr · e8e55d89

Anilkumar Kolli authored Dec 07, 2020

In QCN9074 ath11k boot, firmware crash is observed in 64-bit
builds and is due to wrong 64 bit MSI address size. This patch
fixes the firmware crash. Read msi high addr if 64-bit addresses
allowed on MSI.

Tested-On: QCN9074 hw1.0 PCI WLAN.HK.2.4.0.1.r1-00026-QCAHKSWPL_SILICONZ-2
Signed-off-by: Anilkumar Kolli <akolli@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1606199334-18206-1-git-send-email-akolli@codeaurora.org

e8e55d89

ath10k: fix a check patch warning returnNonBoolInBooleanFunction of sdio.c · 7f881a72

Wen Gong authored Dec 07, 2020

cppcheck possible warnings: (new ones prefixed by >>, may not real problems)
drivers/net/wireless/ath/ath10k/sdio.c:2234:2:
warning: Non-boolean value returned from function returning bool [returnNonBoolInBooleanFunction]
return param & HI_OPTION_SDIO_CRASH_DUMP_ENHANCEMENT_FW;
Reported-by: kernel test robot <rong.a.chen@intel.com>

Tested-on: QCA6174 hw3.2 SDIO WLAN.RMH.4.4.1-00049
Signed-off-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1606103240-9868-1-git-send-email-wgong@codeaurora.org

7f881a72

Merge branch 'mlxsw-Misc-updates' · af3f4a85

David S. Miller authored Dec 06, 2020

Ido Schimmel says:

====================
mlxsw: Misc updates

This patchset contains miscellaneous patches we gathered in our queue.
Some of them are dependencies of larger patchsets that I will submit
later this cycle.

Patches #1-#3 perform small non-functional changes in mlxsw.

Patch #4 adds more extended ack messages in mlxsw.

Patch #5 adds devlink parameters documentation for mlxsw. To be extended
with more parameters this cycle.

Patches #6-#7 perform small changes in forwarding selftests
infrastructure.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

af3f4a85

mlxsw: spectrum_router: Reduce mlxsw_sp_ipip_fib_entry_op_gre4() · acde33bf

Jiri Pirko authored Dec 06, 2020

Turned out that mlxsw_sp_ipip_fib_entry_op_gre4() does not need to
figure out the IP address and virtual router id. Those are exactly
the same as in the fib_entry it is called for. So just use that and
reduce mlxsw_sp_ipip_fib_entry_op_gre4() function to only call
mlxsw_sp_ipip_fib_entry_op_gre4_rtdp() make the ipip decap op
code similar to nve.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

acde33bf

mlxsw: spectrum: Bump minimum FW version to xx.2008.2018 · f54d3c81

Petr Machata authored Dec 06, 2020

The indicated version fixes an issue whereby the MOMTE register would by
default enable mirroring of ECN-marked traffic from all traffic classes,
once the ECN mirroring was configured. This fix is necessary for offload
of RED "ecn_mark" qevent.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f54d3c81

mlxsw: core_acl: Use an array instead of a struct with a zero-length array · 9add5f19

Ido Schimmel authored Dec 06, 2020

Suppresses the following coccinelle warning:

drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c:139:3-7:
WARNING use flexible-array member instead
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9add5f19

mlxsw: spectrum_mr: Use flexible-array member instead of zero-length array · 42c435a2

Ido Schimmel authored Dec 06, 2020

Suppresses the following coccinelle warning:

drivers/net/ethernet/mellanox/mlxsw/spectrum_mr.c:18:15-19: WARNING use flexible-array member instead
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

42c435a2

mlxsw: core: Trace EMAD events · 4834ad80

Ido Schimmel authored Dec 06, 2020

Currently, mlxsw triggers the 'devlink:devlink_hwmsg' tracepoint
whenever a request is sent to the device and whenever a response is
received from it. However, the tracepoint is not triggered when an event
(e.g., port up / down) is received from the device.

Also trace EMAD events in order to log a more complete picture of all
the exchanged hardware messages.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4834ad80

selftests: mlxsw: Test RIF's reference count when joining a LAG · 23fb5552

Ido Schimmel authored Dec 06, 2020

Test that the reference count of a router interface (RIF) configured for
a LAG is incremented / decremented when ports join / leave the LAG. Use
the offload indication on routes configured on the RIF to understand if
it was created / destroyed.

The test fails without the previous patch.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

23fb5552

mlxsw: spectrum: Apply RIF configuration when joining a LAG · 31e1de4f

Ido Schimmel authored Dec 06, 2020

In case a router interface (RIF) is configured for a LAG, make sure its
configuration is applied on the new LAG member.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

31e1de4f

06 Dec, 2020 3 commits

Merge branch 'r8169-improve-rtl_rx-and-NUM_RX_DESC-handling' · 4054eebf

David S. Miller authored Dec 05, 2020

Heiner Kallweit says:

====================
r8169: improve rtl_rx and NUM_RX_DESC handling

This series improves rtl_rx() and the handling of NUM_RX_DESC.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

4054eebf

r8169: make NUM_RX_DESC a signed int · ed22a8ff

Heiner Kallweit authored Dec 06, 2020

After recent changes there's no need any longer to define NUM_RX_DESC
as an unsigned value.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ed22a8ff

r8169: improve rtl_rx · 2f53e9d7

Heiner Kallweit authored Dec 06, 2020

There's no need to check min(budget, NUM_RX_DESC). At first budget
(NAPI_POLL_WEIGHT = 64) is less then NUM_RX_DESC (256).
And more important: Even in case of budget > NUM_RX_DESC we could
safely continue processing descriptors as long as they are owned by
the CPU. In addition replace rx_left with a normal counter variable,
this allows to simplify the code. Last but not least there's no need
any longer to pass the budget as an u32.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2f53e9d7

05 Dec, 2020 9 commits

net: fix spelling mistake "wil" -> "will" in Kconfig · 00649542

Colin Ian King authored Dec 04, 2020

There is a spelling mistake in the Kconfig help text. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20201204194549.1153063-1-colin.king@canonical.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

00649542

Merge tag 'batadv-next-pullrequest-20201204' of git://git.open-mesh.org/linux-merge · 78d6bb58

Jakub Kicinski authored Dec 05, 2020

Simon Wunderlich says:

====================
This cleanup patchset includes the following patches:

 - bump version strings, by Simon Wunderlich

 - update include for min/max helpers, by Sven Eckelmann

 - add infrastructure and netlink functions for routing algo selection,
   by Sven Eckelmann (2 patches)

 - drop deprecated debugfs and sysfs support and obsoleted
   functionality, by Sven Eckelmann (3 patches)

 - drop unused include in fragmentation.c, by Simon Wunderlich

* tag 'batadv-next-pullrequest-20201204' of git://git.open-mesh.org/linux-merge:
  batman-adv: Drop unused soft-interface.h include in fragmentation.c
  batman-adv: Drop legacy code for auto deleting mesh interfaces
  batman-adv: Drop deprecated debugfs support
  batman-adv: Drop deprecated sysfs support
  batman-adv: Allow selection of routing algorithm over rtnetlink
  batman-adv: Prepare infrastructure for newlink settings
  batman-adv: Add new include for min/max helpers
  batman-adv: Start new development cycle
====================

Link: https://lore.kernel.org/r/20201204154631.21063-1-sw@simonwunderlich.deSigned-off-by: Jakub Kicinski <kuba@kernel.org>

78d6bb58

enetc: Fix unused var build warning for CONFIG_OF · 4560b2a3

Arnd Bergmann authored Dec 04, 2020

When CONFIG_OF is disabled, there is a harmless warning about
an unused variable:

enetc_pf.c: In function 'enetc_phylink_create':
enetc_pf.c:981:17: error: unused variable 'dev' [-Werror=unused-variable]

Slightly rearrange the code to pass around the of_node as a
function argument, which avoids the problem without hurting
readability.

Fixes: 71b77a7a ("enetc: Migrate to PHYLINK and PCS_LYNX")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Link: https://lore.kernel.org/r/20201204120800.17193-1-claudiu.manoil@nxp.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

4560b2a3

ptp: Add clock driver for the OpenCompute TimeCard. · a7e1abad

Jonathan Lemon authored Dec 03, 2020

The OpenCompute time card is an atomic clock along with
a GPS receiver that provides a Grandmaster clock source
for a PTP enabled network.

More information is available at http://www.timingcard.com/Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Link: https://lore.kernel.org/r/20201204035128.2219252-2-jonathan.lemon@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

a7e1abad

net/nfc/nci: Support NCI 2.x initial sequence · bcd684aa

Bongsu Jeon authored Dec 03, 2020

implement the NCI 2.x initial sequence to support NCI 2.x NFCC.
Since NCI 2.0, CORE_RESET and CORE_INIT sequence have been changed.
If NFCEE supports NCI 2.x, then NCI 2.x initial sequence will work.

In NCI 1.0, Initial sequence and payloads are as below:
(DH)                     (NFCC)
 |  -- CORE_RESET_CMD --> |
 |  <-- CORE_RESET_RSP -- |
 |  -- CORE_INIT_CMD -->  |
 |  <-- CORE_INIT_RSP --  |
 CORE_RESET_RSP payloads are Status, NCI version, Configuration Status.
 CORE_INIT_CMD payloads are empty.
 CORE_INIT_RSP payloads are Status, NFCC Features,
    Number of Supported RF Interfaces, Supported RF Interface,
    Max Logical Connections, Max Routing table Size,
    Max Control Packet Payload Size, Max Size for Large Parameters,
    Manufacturer ID, Manufacturer Specific Information.

In NCI 2.0, Initial Sequence and Parameters are as below:
(DH)                     (NFCC)
 |  -- CORE_RESET_CMD --> |
 |  <-- CORE_RESET_RSP -- |
 |  <-- CORE_RESET_NTF -- |
 |  -- CORE_INIT_CMD -->  |
 |  <-- CORE_INIT_RSP --  |
 CORE_RESET_RSP payloads are Status.
 CORE_RESET_NTF payloads are Reset Trigger,
    Configuration Status, NCI Version, Manufacturer ID,
    Manufacturer Specific Information Length,
    Manufacturer Specific Information.
 CORE_INIT_CMD payloads are Feature1, Feature2.
 CORE_INIT_RSP payloads are Status, NFCC Features,
    Max Logical Connections, Max Routing Table Size,
    Max Control Packet Payload Size,
    Max Data Packet Payload Size of the Static HCI Connection,
    Number of Credits of the Static HCI Connection,
    Max NFC-V RF Frame Size, Number of Supported RF Interfaces,
    Supported RF Interfaces.
Signed-off-by: Bongsu Jeon <bongsu.jeon@samsung.com>
Link: https://lore.kernel.org/r/20201202223147.3472-1-bongsu.jeon@samsung.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

bcd684aa

selftests: forwarding: Add MPLS L2VPN test · 41fdfffd

Guillaume Nault authored Dec 02, 2020

Connect hosts H1 and H2 using two intermediate encapsulation routers
(LER1 and LER2). These routers encapsulate traffic from the hosts,
including the original Ethernet header, into MPLS.

Use ping to test reachability between H1 and H2.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Link: https://lore.kernel.org/r/625f5c1aafa3a8085f8d3e082d680a82e16ffbaa.1606918980.git.gnault@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

41fdfffd

net: bna: remove trailing semicolon in macro definition · 0911d463

Tom Rix authored Dec 02, 2020

The macro use will already have a semicolon.
Clean up escaped newlines.
Signed-off-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20201202163622.3733506-1-trix@redhat.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

0911d463

tipc: support 128bit node identity for peer removing · 43fcd906

Hoang Le authored Dec 03, 2020

We add the support to remove a specific node down with 128bit
node identifier, as an alternative to legacy 32-bit node address.

example:
$tipc peer remove identiy <1001002|16777777>
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Link: https://lore.kernel.org/r/20201203035045.4564-1-hoang.h.le@dektech.com.auSigned-off-by: Jakub Kicinski <kuba@kernel.org>

43fcd906

nfp: Replace zero-length array with flexible-array member · 7f356166

Simon Horman authored Dec 04, 2020

There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code
should always use "flexible array members"[1] for these cases. The older
style of one-element or zero-length arrays should no longer be used[2].

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.9/process/deprecated.html#zero-length-and-one-element-arrays

Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Link: https://lore.kernel.org/r/20201204125601.24876-1-simon.horman@netronome.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

7f356166

04 Dec, 2020 13 commits

nfc: s3fwrn5: skip the NFC bootloader mode · 4fb7b98c

Bongsu Jeon authored Dec 04, 2020

If there isn't a proper NFC firmware image, Bootloader mode will be
skipped.
Signed-off-by: Bongsu Jeon <bongsu.jeon@samsung.com>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Link: https://lore.kernel.org/r/20201203225257.2446-1-bongsu.jeon@samsung.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

4fb7b98c

Merge branch 'perf-optimizations-for-tcp-recv-zerocopy' · 43be3a3c

Jakub Kicinski authored Dec 04, 2020

Arjun Roy says:

====================
Perf. optimizations for TCP Recv. Zerocopy

This patchset contains several optimizations for TCP Recv. Zerocopy.

Summarized:
1. It is possible that a read payload is not exactly page aligned -
that there may exist "straggler" bytes that we cannot map into the
caller's address space cleanly. For this, we allow the caller to
provide as argument a "hybrid copy buffer", turning
getsockopt(TCP_ZEROCOPY_RECEIVE) into a "hybrid" operation that allows
the caller to avoid a subsequent recvmsg() call to read the
stragglers.

2. Similarly, for "small" read payloads that are either below the size
of a page, or small enough that remapping pages is not a performance
win - we allow the user to short-circuit the remapping operations
entirely and simply copy into the buffer provided.

Some of the patches in the middle of this set are refactors to support
this "short-circuiting" optimization.

3. We allow the user to provide a hint that performing a page zap
operation (and the accompanying TLB shootdown) may not be necessary,
for the provided region that the kernel will attempt to map pages
into. This allows us to avoid this expensive operation while holding
the socket lock, which provides a significant performance advantage.

With all of these changes combined, "medium" sized receive traffic
(multiple tens to few hundreds of KB) see significant efficiency gains
when using TCP receive zerocopy instead of regular recvmsg(). For
example, with RPC-style traffic with 32KB messages, there is a roughly
15% efficiency improvement when using zerocopy. Without these changes,
there is a roughly 60-70% efficiency reduction with such messages when
employing zerocopy.
====================

Link: https://lore.kernel.org/r/20201202225349.935284-1-arjunroy.kdev@gmail.comSigned-off-by: Jakub Kicinski <kuba@kernel.org>

43be3a3c

net-zerocopy: Defer vm zap unless actually needed. · 94ab9eb9

Arjun Roy authored Dec 02, 2020

Zapping pages is required only if we are calling vm_insert_page into a
region where pages had previously been mapped. Receive zerocopy allows
reusing such regions, and hitherto called zap_page_range() before
calling vm_insert_page() in that range.

zap_page_range() can also be triggered from userspace with
madvise(MADV_DONTNEED). If userspace is configured to call this before
reusing a segment, or if there was nothing mapped at this virtual
address to begin with, we can avoid calling zap_page_range() under the
socket lock. That said, if userspace does not do that, then we are
still responsible for calling zap_page_range().

This patch adds a flag that the user can use to hint to the kernel
that a zap is not required. If the flag is not set, or if an older
user application does not have a flags field at all, then the kernel
calls zap_page_range as before. Also, if the flag is set but a zap is
still required, the kernel performs that zap as necessary. Thus
incorrectly indicating that a zap can be avoided does not change the
correctness of operation. It also increases the batchsize for
vm_insert_pages and prefetches the page struct for the batch since
we're about to bump the refcount.

An alternative mechanism could be to not have a flag, assume by
default a zap is not needed, and fall back to zapping if needed.
However, this would harm performance for older applications for which
a zap is necessary, and thus we implement it with an explicit flag
so newer applications can opt in.

When using RPC-style traffic with medium sized (tens of KB) RPCs, this
change yields an efficency improvement of about 30% for QPS/CPU usage.
Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

94ab9eb9

net-zerocopy: Set zerocopy hint when data is copied · 0c3936d3

Arjun Roy authored Dec 02, 2020

Set zerocopy hint, event when falling back to copy, so that the
pending data can be efficiently received using zerocopy when
possible.
Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

0c3936d3

net-zerocopy: Introduce short-circuit small reads. · f21a3c48

Arjun Roy authored Dec 02, 2020

Sometimes, we may call tcp receive zerocopy when inq is 0,
or inq < PAGE_SIZE, or inq is generally small enough that
it is cheaper to copy rather than remap pages.

In these cases, we may want to either return early (inq=0) or
attempt to use the provided copy buffer to simply copy
the received data.

This allows us to save both system call overhead and
the latency of acquiring mmap_sem in read mode for cases where
it would be useless to do so.

This patchset enables this behaviour by:
1. Returning quickly if inq is 0.
2. Attempting to perform a regular copy if a hybrid copybuffer is
   provided and it is large enough to absorb all available bytes.
3. Return quickly if no such buffer was provided and there are less
   than PAGE_SIZE bytes available.

For small RPC ping-pong workloads, normally we would have
1 getsockopt(), 1 recvmsg() and 1 sendmsg() call per RPC. With this
change, we remove the recvmsg() call entirely, reducing the syscall
overhead by about 33%. In testing with small (hundreds of bytes)
RPC traffic, this yields a syscall reduction of about 33% and
an efficiency gain of about 3-5% when defined as QPS/CPU Util.
Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

f21a3c48

net-zerocopy: Fast return if inq < PAGE_SIZE · 936ced41

Arjun Roy authored Dec 02, 2020

Sometimes, we may call tcp receive zerocopy when inq is 0,
or inq < PAGE_SIZE, in which case we cannot remap pages. In this case,
simply return the appropriate hint for regular copying without taking
mmap_sem.
Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

936ced41

net-zerocopy: Refactor frag-is-remappable test. · 98917cf0

Arjun Roy authored Dec 02, 2020

Refactor frag-is-remappable test for tcp receive zerocopy. This is
part of a patch set that introduces short-circuited hybrid copies
for small receive operations, which results in roughly 33% fewer
syscalls for small RPC scenarios.
Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

98917cf0

net-zerocopy: Refactor skb frag fast-forward op. · 7fba5309

Arjun Roy authored Dec 02, 2020

Refactor skb frag fast-forwarding for tcp receive zerocopy. This is
part of a patch set that introduces short-circuited hybrid copies
for small receive operations, which results in roughly 33% fewer
syscalls for small RPC scenarios.

skb_advance_to_frag(), given a skb and an offset into the skb,
iterates from the first frag for the skb until we're at the frag
specified by the offset. Assuming the offset provided refers to how
many bytes in the skb are already read, the returned frag points to
the next frag we may read from, while offset_frag is set to the number
of bytes from this frag that we have already read.

If frag is not null and offset_frag is equal to 0, then we may be able
to map this frag's page into the process address space with
vm_insert_page(). However, if offset_frag is not equal to 0, then we
cannot do so.
Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

7fba5309

net-tcp: Introduce tcp_recvmsg_locked(). · 2cd81161

Arjun Roy authored Dec 02, 2020

Refactor tcp_recvmsg() by splitting it into locked and unlocked
portions. Callers already holding the socket lock and not using
ERRQUEUE/cmsg/busy polling can simply call tcp_recvmsg_locked().
This is in preparation for a short-circuit copy performed by
TCP receive zerocopy for small (< PAGE_SIZE, or otherwise requested
by the user) reads.
Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

2cd81161

net-zerocopy: Copy straggler unaligned data for TCP Rx. zerocopy. · 18fb76ed

Arjun Roy authored Dec 02, 2020

When TCP receive zerocopy does not successfully map the entire
requested space, it outputs a 'hint' that the caller should recvmsg().

Augment zerocopy to accept a user buffer that it tries to copy this
hint into - if it is possible to copy the entire hint, it will do so.
This elides a recvmsg() call for received traffic that isn't exactly
page-aligned in size.

This was tested with RPC-style traffic of arbitrary sizes. Normally,
each received message required at least one getsockopt() call, and one
recvmsg() call for the remaining unaligned data.

With this change, almost all of the recvmsg() calls are eliminated,
leading to a savings of about 25%-50% in number of system calls
for RPC-style workloads.
Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

18fb76ed

Merge branch 'seg6-add-support-for-srv6-end-dt4-dt6-behavior' · 4be986c8

Jakub Kicinski authored Dec 04, 2020

Andrea Mayer says:

====================
seg6: add support for SRv6 End.DT4/DT6 behavior

This patchset provides support for the SRv6 End.DT4 and End.DT6 (VRF mode)
behaviors.

The SRv6 End.DT4 behavior is used to implement multi-tenant IPv4 L3 VPNs. It
decapsulates the received packets and performs IPv4 routing lookup in the
routing table of the tenant. The SRv6 End.DT4 Linux implementation leverages a
VRF device in order to force the routing lookup into the associated routing
table.
The SRv6 End.DT4 behavior is defined in the SRv6 Network Programming [1].

The Linux kernel already offers an implementation of the SRv6 End.DT6 behavior
which allows us to set up IPv6 L3 VPNs over SRv6 networks. This new
implementation of DT6 is based on the same VRF infrastructure already exploited
for implementing the SRv6 End.DT4 behavior. The aim of the new SRv6 End.DT6 in
VRF mode consists in simplifying the construction of IPv6 L3 VPN services in
the multi-tenant environment.
Currently, the two SRv6 End.DT6 implementations (legacy and VRF mode)
coexist seamlessly and can be chosen according to the context and the user
preferences.

- Patch 1 is needed to solve a pre-existing issue with tunneled packets
  when a sniffer is attached;

- Patch 2 improves the management of the seg6local attributes used by the
  SRv6 behaviors;

- Patch 3 adds support for optional attributes in SRv6 behaviors;

- Patch 4 introduces two callbacks used for customizing the
  creation/destruction of a SRv6 behavior;

- Patch 5 is the core patch that adds support for the SRv6 End.DT4
  behavior;

- Patch 6 introduces the VRF support for SRv6 End.DT6 behavior;

- Patch 7 adds the selftest for SRv6 End.DT4 behavior;

- Patch 8 adds the selftest for SRv6 End.DT6 (VRF mode) behavior.

Regarding iproute2, the support for the new "vrftable" attribute, required by
both SRv6 End.DT4 and End.DT6 (VRF mode) behaviors, is provided in a different
patchset that will follow shortly.

I would like to thank David Ahern for his support during the development of
this patchset.

[1] https://tools.ietf.org/html/draft-ietf-spring-srv6-network-programming
====================

Link: https://lore.kernel.org/r/20201202130517.4967-1-andrea.mayer@uniroma2.itSigned-off-by: Jakub Kicinski <kuba@kernel.org>

4be986c8

selftests: add selftest for the SRv6 End.DT6 (VRF) behavior · 2bc03553

Andrea Mayer authored Dec 02, 2020

this selftest is designed for evaluating the new SRv6 End.DT6 (VRF) behavior
used, in this example, for implementing IPv6 L3 VPN use cases.
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Signed-off-by: Paolo Lungaroni <paolo.lungaroni@cnit.it>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

2bc03553

selftests: add selftest for the SRv6 End.DT4 behavior · 2195444e

Andrea Mayer authored Dec 02, 2020

this selftest is designed for evaluating the new SRv6 End.DT4 behavior
used, in this example, for implementing IPv4 L3 VPN use cases.
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

2195444e