Commits · 8231bee646b7ca3a474f067d1b1276175af2caf6 · nexedi / linux

30 Apr, 2018 40 commits

Merge branch 'mlxsw-SPAN-Support-routes-pointing-at-bridges' · 8231bee6

David S. Miller authored Apr 30, 2018

Ido Schimmel says:

====================
mlxsw: SPAN: Support routes pointing at bridges

Petr says:

When mirroring to a gretap or ip6gretap netdevice, the route that
directs the encapsulated packets can reference a bridge. In that case,
in the software model, the packet is switched.

Thus when offloading mirroring like that, take into consideration FDB,
STP, PVID configured at the bridge, and whether that VLAN ID should be
tagged on egress.

Patch #1 introduces functions to get bridge PVID, VLAN flags and to look
up an FDB entry.

Patches #2 and #3 refactor some existing code and introduce a new
accessor function.

With patches #4 and #5 mlxsw calls mlxsw_sp_span_respin() on switchdev
events as well. There is no impact yet, because bridge as an underlay
device is still not allowed.

That is implemented in patch #6, which uses the new interfaces to figure
out on which one port the mirroring should be configured, and whether
the mirrored packets should be VLAN-tagged and how.

Changes from v2 to v3:

- Rename the suite of bridge accessor function to br_vlan_get_pvid(),
  br_vlan_get_info() and br_fdb_find_port(). The _get bit is to avoid
  clashing with an existing static function.

Changes from v1 to v2:

- Change the suite of bridge accessor functions to br_vlan_pvid_rtnl(),
  br_vlan_info_rtnl(), br_fdb_find_port_rtnl().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

8231bee6

mlxsw: spectrum_span: Allow bridge for gretap mirror · 946a11e7

Petr Machata authored Apr 29, 2018

When handling mirroring to a gretap or ip6gretap netdevice in mlxsw, the
underlay address (i.e. the remote address of the tunnel) may be routed
to a bridge.

In that case, look up the resolved neighbor Ethernet address in that
bridge's FDB. Then configure the offload to direct the mirrored traffic
to that port, possibly with tagging.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

946a11e7

mlxsw: Respin SPAN on switchdev events · c520bc69

Petr Machata authored Apr 29, 2018

Changes to switchdev artifact can make a SPAN entry offloadable or
unoffloadable. To that end:

- Listen to SWITCHDEV_FDB_*_TO_BRIDGE notifications in addition to
  the *_TO_DEVICE ones, to catch whatever activity is sent to the
  bridge (likely by mlxsw itself).

  On each FDB notification, respin SPAN to reconcile it with the FDB
  changes.

- Also respin on switchdev port attribute changes (which currently
  covers changes to STP state of ports) and port object additions and
  removals.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c520bc69

mlxsw: spectrum: Register SPAN before switchdev · cda880de

Petr Machata authored Apr 29, 2018

Since switchdev events can trigger SPAN respin, it is necessary that the
data structures are available. Register SPAN first, with a commentary on
what the dependencies are.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cda880de

mlxsw: spectrum_switchdev: Publish two functions · ea93c7b6

Petr Machata authored Apr 29, 2018

Publish the existing function mlxsw_sp_bridge_port_find(), and add
another service accessor mlxsw_sp_bridge_port_stp_state(). Publish both
in a new file spectrum_switchdev.h.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ea93c7b6

mlxsw: spectrum: Extract mlxsw_sp_stp_spms_state() · 541e1159

Petr Machata authored Apr 29, 2018

Instead of duplicating the decision regarding port forwarding state made
by mlxsw_sp_port_vid_stp_set(), extract the decision-making into a new
function and reuse.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

541e1159

net: bridge: Publish bridge accessor functions · 4d4fd361

Petr Machata authored Apr 29, 2018

Add a couple new functions to allow querying FDB and vlan settings of a
bridge.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4d4fd361

ipv6: sr: extract the right key values for "seg6_make_flowlabel" · 6df93462

Ahmed Abdelsalam authored Apr 28, 2018

The seg6_make_flowlabel() is used by seg6_do_srh_encap() to compute the
flowlabel from a given skb. It relies on skb_get_hash() which eventually
calls __skb_flow_dissect() to extract the flow_keys struct values from
the skb.

In case of IPv4 traffic, calling seg6_make_flowlabel() after skb_push(),
skb_reset_network_header(), and skb_mac_header_rebuild() will results in
flow_keys struct of all key values set to zero.

This patch calls seg6_make_flowlabel() before resetting the headers of skb
to get the right key values.

Extracted Key values are based on the type inner packet as follows:
1) IPv6 traffic: src_IP, dst_IP, L4 proto, and flowlabel of inner packet.
2) IPv4 traffic: src_IP, dst_IP, L4 proto, src_port, and dst_port
3) L2 traffic: depends on what kind of traffic carried into the L2
frame. IPv6 and IPv4 traffic works as discussed 1) and 2)

Here a hex_dump of struct flow_keys for IPv4 and IPv6 traffic
10.100.1.100: 47302 > 30.0.0.2: 5001
00000000: 14 00 02 00 00 00 00 00 08 00 11 00 00 00 00 00
00000010: 00 00 00 00 00 00 00 00 13 89 b8 c6 1e 00 00 02
00000020: 0a 64 01 64

fc00:a1:a > b2::2
00000000: 28 00 03 00 00 00 00 00 86 dd 11 00 99 f9 02 00
00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 b2 00 00
00000020: 00 00 00 00 00 00 00 00 00 00 00 02 fc 00 00 a1
00000030: 00 00 00 00 00 00 00 00 00 00 00 0a
Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6df93462

libcxgb,cxgb4: use __skb_put_zero to simplfy code · a6a188e4

YueHaibing authored Apr 28, 2018

use helper __skb_put_zero to replace the pattern of __skb_put() && memset()
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a6a188e4

erspan: auto detect truncated packets. · 1baf5ebf

William Tu authored Apr 27, 2018

Currently the truncated bit is set only when the mirrored packet
is larger than mtu.  For certain cases, the packet might already
been truncated before sending to the erspan tunnel.  In this case,
the patch detect whether the IP header's total length is larger
than the actual skb->len.  If true, this indicated that the
mirrored packet is truncated and set the erspan truncate bit.

I tested the patch using bpf_skb_change_tail helper function to
shrink the packet size and send to erspan tunnel.
Reported-by: Xiaoyan Jin <xiaoyanj@vmware.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1baf5ebf

Merge branch 'r8169-further-improvements' · 65245d84

David S. Miller authored Apr 30, 2018

Heiner Kallweit says:

====================
r8169: further improvements w/o functional change

This series aims at further improving and simplifying the code w/o
any intended functional changes.

Series was tested on: RTL8169sb, RTL8168d, RTL8168e-vl
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

65245d84

r8169: move common initializations to tp->hw_start · 4fd48c4a

Heiner Kallweit authored Apr 28, 2018

The chip-specific init code includes quite some calls which are
identical for all chips. So move these calls to tp->hw_start().

In addition move rtl_set_rx_max_size() a little to make sure it's
defined before it's used. Unfortunately the diff generated by git
is a little bit hard to read.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4fd48c4a

r8169: remove calls to rtl_set_rx_mode · 82d3ff6d

Heiner Kallweit authored Apr 28, 2018

__dev_open() calls the ndo_set_rx_mode callback anyway, so we don't
have to do it here too.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

82d3ff6d

r8169: simplify rtl_hw_start_8169 · 3559d81e

Heiner Kallweit authored Apr 28, 2018

Currently done:
- if mac_version in (01, 02, 03, 04)
	RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb);
- if mac_version in (01, 02, 03, 04)
	rtl_set_rx_tx_config_registers(tp);
- if mac_version not in (01, 02, 03, 04)
	RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb);
	rtl_set_rx_tx_config_registers(tp);

So we do exactly the same independent of chip version and can simplify
the code.

In addition remove the call to rtl_init_rxcfg(), it's called in
rtl_init_one() already and the set bits are never touched later.
rtl_init_8168/8101 don't include this call either.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3559d81e

r8169: improve handling of CPCMD quirk mask · 12d42c50

Heiner Kallweit authored Apr 28, 2018

Both quirk masks are the same, so we can merge them. The quirk mask
includes most bits so it's actually easier to define a mask with
the bits to keep.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

12d42c50

r8169: improve CPlusCmd handling · 0ae0974e

Heiner Kallweit authored Apr 28, 2018

tp->cp_cmd is supposed to reflect the current value of the CplusCmd
register. Several (quite old) changes however directly change this
register w/o updating tp->cp_cmd. Also we have places in the code
reading this register where we could use the cached value.

In addition:
- Properly initialize tp->cmd with the register value.
- In rtl_hw_start_8169 remove one setting of PCIMulRW because it's
  set unconditionally anyway a few lines later.
- In rtl_hw_start_8168 properly mask out the INTT bits before
  setting INTT_1. So far we rely on both bits being zero.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0ae0974e

r8169: replace magic number for INTT mask with a constant · 9a3c81fa

Heiner Kallweit authored Apr 28, 2018

Use a proper constant for INTT bit mask.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9a3c81fa

r8169: improve rtl8169_set_features · a3984578

Heiner Kallweit authored Apr 28, 2018

__rtl8169_set_features is used in rtl8169_set_features only, so we
can inline it. In addition:
- Remove check (features ^ dev->features), __netdev_update_features
  check's already that requested features differ from current ones.
- Don't mask out unsupported flags, there's no benefit in it.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a3984578

r8169: remove unneeded call to __rtl8169_set_features in rtl_open · e6626748

Heiner Kallweit authored Apr 28, 2018

RxChkSum and RxVlan aren't touched outside __rtl8169_set_features
(except in probe), so they are always in sync with dev->features.
And the RxConfig flags are set in rtl_set_rx_mode() which is
called via dev_set_rx_mode() from __dev_open().
Therefore we can safely remove this call.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e6626748

liquidio: fix spelling mistake: "mac_tx_multi_collison" -> "mac_tx_multi_collision" · 76c2a96d

Colin Ian King authored Apr 28, 2018

Trivial fix to spelling mistake in oct_stats_strings text
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

76c2a96d

Merge branch 'liquidio-enhanced-ethtool-set-channels-feature' · dbafb0c4

David S. Miller authored Apr 30, 2018

Intiyaz Basha says:

====================
liquidio: enhanced ethtool --set-channels feature

For the ethtool --set-channels feature, the liquidio driver currently
accepts max combined value as the queue count configured during driver
load time, where max combined count is the total count of input and output
queues. This limitation is applicable only when SR-IOV is enabled, that
is, when VFs are created for PF. If SR-IOV is not enabled, the driver can
configure max supported (64) queues.

This series of patches are for enhancing driver to accept
max supported queues for ethtool --set-channels.

Changes in V2:
  Only patch #6 was changed to fix these Sparse warnings reported by kbuild
  test robot:
    lio_ethtool.c:848:5: warning: symbol 'lio_23xx_reconfigure_queue_count'
                         was not declared. Should it be static?
    lio_ethtool.c:877:22: warning: incorrect type in assignment (different
                          base types)
    lio_ethtool.c:878:22: warning: incorrect type in assignment (different
                          base types)
    lio_ethtool.c:879:22: warning: incorrect type in assignment (different
                          base types)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

dbafb0c4

liquidio: enhanced ethtool --set-channels feature · c33c9973

Intiyaz Basha authored Apr 27, 2018

Enhancing driver to accept max supported queues for ethtool --set-channels
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c33c9973

liquidio: Moved common function setup_glists to lio_core.c · 128ea394

Intiyaz Basha authored Apr 27, 2018

Moved common function setup_glists to lio_core.c
and reamed it to lio_setup_glists
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

128ea394

liquidio: Moved common definition octnic_gather to octeon_network.h · a72b2c8c

Intiyaz Basha authored Apr 27, 2018

Moving common definition octnic_gather to octeon_network.h
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a72b2c8c

liquidio: Moved common function delete_glists to lio_core.c · fd311f1e

Intiyaz Basha authored Apr 27, 2018

Moved common function delete_glists to lio_core.c
and renamed it to lio_delete_glists
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fd311f1e

liquidio: Moved common function list_delete_head to octeon_network.h · 85a0cd81

Intiyaz Basha authored Apr 27, 2018

Moved common function list_delete_head to octeon_network.h
and renamed it to lio_list_delete_head
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

85a0cd81

liquidio: Moved common function if_cfg_callback to lio_core.c · 592a4ceb

Intiyaz Basha authored Apr 27, 2018

Moved common function if_cfg_callback to lio_core.c
and renamed it to lio_if_cfg_callback.
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

592a4ceb

net: core: Assert the size of netdev_featres_t · 3ac305c3

Florian Fainelli authored Apr 27, 2018

We have about 53 netdev_features_t bits defined and counting, add a
build time check to catch when an u64 type will not be enough and we
will have to convert that to a bitmap. This is done in
register_netdevice() for convenience.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3ac305c3

Merge branch 'net-cleanup-skb_tx_hash' · c1b28847

David S. Miller authored Apr 29, 2018

Alexander Duyck says:

====================
Clean up users of skb_tx_hash and __skb_tx_hash

I am in the process of doing some work to try and enable macvlan Tx queue
selection without using ndo_select_queue. As a part of that I will likely
need to make changes to skb_tx_hash. As such this is a clean up or refactor
of the two spots where he function has been used. In both cases it didn't
really seem like the function was being used correctly so I have updated
both code paths to not make use of the function.

My current development environment doesn't have an mlx4 or OPA vnic
available so the changes to those have been build tested only.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

c1b28847

net: Revoke export for __skb_tx_hash, update it to just be static skb_tx_hash · 1b837d48

Alexander Duyck authored Apr 27, 2018

I am dropping the export of __skb_tx_hash as after my patches nobody is
using it outside of the net/core/dev.c file. In addition I am renaming and
repurposing it to just be a static declaration of skb_tx_hash since that
was the only user for it at this point. By doing this the compiler can
inline it into __netdev_pick_tx as that will improve performance.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1b837d48

mlx4: Don't bother using skb_tx_hash in mlx4_en_select_queue · b86629eb

Alexander Duyck authored Apr 27, 2018

The code in the fallback path has supported XDP in conjunction with the Tx
traffic classification for TCs for over a year now. So instead of just
calling skb_tx_hash for every packet we are better off using the fallback
since that will record the Tx queue to the socket and then that can be used
instead of having to recompute the hash every time.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b86629eb

opa_vnic: Just use skb_get_hash instead of skb_tx_hash · d80d8d55

Alexander Duyck authored Apr 27, 2018

This patch is meant to clean up how the opa_vnic is obtaining entropy from
Tx packets.

The code as it was written was claiming to get 16 bits of hash, but from
what I can tell it was only ever actually getting 14 bits as it was limited
to 0 - (2^15 - 1). It then was folding the result to get a 8 bit value for
entropy.

Instead of throwing away all that input I am cutting out the middle man and
instead having the code call skb_get_hash directly and then folding the 32
bit value into a 8 bit value using a pair of shifts and XOR operations.

Execution wise this new approach should provide more entropy and be faster
since we are bypassing the reciprocal multiplication to reduce the 32b
value to 16b and instead just using a shift/XOR combination.

In addition we can drop the unneeded adapter value from the call to get the
entropy since the netdev itself isn't even needed.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d80d8d55

Merge branch 'lan78xx-fixed-phy' · f9065284

David S. Miller authored Apr 29, 2018

Raghuram Chary J says:

====================
lan78xx updates along with Fixed phy Support

These series of patches handle few modifications in driver
and adds support for fixed phy.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

f9065284

lan78xx: Modify error messages · 7670ed7a

Raghuram Chary J authored Apr 28, 2018

Modify the error messages when phy registration fails.
Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7670ed7a

lan78xx: Remove DRIVER_VERSION for lan78xx driver · e92258c7

Raghuram Chary J authored Apr 28, 2018

Remove driver version info from the lan78xx driver.
Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e92258c7

lan78xx: Lan7801 Support for Fixed PHY · 89b36fb5

Raghuram Chary J authored Apr 28, 2018

Adding Fixed PHY support to the lan78xx driver.
Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

89b36fb5

Merge branch 'tcp-mmap-rework-zerocopy-receive' · 5d659b1d

David S. Miller authored Apr 29, 2018

Eric Dumazet says:

====================
tcp: mmap: rework zerocopy receive

syzbot reported a lockdep issue caused by tcp mmap() support.

I implemented Andy Lutomirski nice suggestions to resolve the
issue and increase scalability as well.

First patch is adding a new getsockopt() operation and changes mmap()
behavior.

Second patch changes tcp_mmap reference program.

v4: tcp mmap() support depends on CONFIG_MMU, as kbuild bot told us.

v3: change TCP_ZEROCOPY_RECEIVE to be a getsockopt() option
    instead of setsockopt(), feedback from Ka-Cheon Poon

v2: Added a missing page align of zc->length in tcp_zerocopy_receive()
    Properly clear zc->recv_skip_hint in case user request was completed.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

5d659b1d

selftests: net: tcp_mmap must use TCP_ZEROCOPY_RECEIVE · aacb0c2e

Eric Dumazet authored Apr 27, 2018

After prior kernel change, mmap() on TCP socket only reserves VMA.

We have to use getsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...)
to perform the transfert of pages from skbs in TCP receive queue into such VMA.

struct tcp_zerocopy_receive {
	__u64 address;		/* in: address of mapping */
	__u32 length;		/* in/out: number of bytes to map/mapped */
	__u32 recv_skip_hint;	/* out: amount of bytes to skip */
};

After a successful getsockopt(...TCP_ZEROCOPY_RECEIVE...), @length contains
number of bytes that were mapped, and @recv_skip_hint contains number of bytes
that should be read using conventional read()/recv()/recvmsg() system calls,
to skip a sequence of bytes that can not be mapped, because not properly page
aligned.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aacb0c2e

tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive · 05255b82

Eric Dumazet authored Apr 27, 2018

When adding tcp mmap() implementation, I forgot that socket lock
had to be taken before current->mm->mmap_sem. syzbot eventually caught
the bug.

Since we can not lock the socket in tcp mmap() handler we have to
split the operation in two phases.

1) mmap() on a tcp socket simply reserves VMA space, and nothing else.
  This operation does not involve any TCP locking.

2) getsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) implements
 the transfert of pages from skbs to one VMA.
  This operation only uses down_read(&current->mm->mmap_sem) after
  holding TCP lock, thus solving the lockdep issue.

This new implementation was suggested by Andy Lutomirski with great details.

Benefits are :

- Better scalability, in case multiple threads reuse VMAS
   (without mmap()/munmap() calls) since mmap_sem wont be write locked.

- Better error recovery.
   The previous mmap() model had to provide the expected size of the
   mapping. If for some reason one part could not be mapped (partial MSS),
   the whole operation had to be aborted.
   With the tcp_zerocopy_receive struct, kernel can report how
   many bytes were successfuly mapped, and how many bytes should
   be read to skip the problematic sequence.

- No more memory allocation to hold an array of page pointers.
  16 MB mappings needed 32 KB for this array, potentially using vmalloc() :/

- skbs are freed while mmap_sem has been released

Following patch makes the change in tcp_mmap tool to demonstrate
one possible use of mmap() and setsockopt(... TCP_ZEROCOPY_RECEIVE ...)

Note that memcg might require additional changes.

Fixes: 93ab6cc6 ("tcp: implement mmap() for zero copy receive")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Cc: linux-mm@kvack.org
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

05255b82

Merge branch 'dsa-mv88e6xxx-remove-Global-2-setup' · 589f84fb

David S. Miller authored Apr 29, 2018

Vivien Didelot says:

====================
net: dsa: mv88e6xxx: remove Global 2 setup

Parts of the mv88e6xxx driver still write arbitrary registers of
different banks at setup time, which is misleading especially when
supporting multiple device models.

This patchset moves two features setup into the top lovel
mv88e6xxx_setup function and kills the old Global 2 register bank setup
function. It brings no functional changes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

589f84fb