Commits · b58e9fd483482a075bf6cad988c1ef43022417ce · nexedi / linux

11 Dec, 2019 24 commits

Merge branch 'sfp-copper-modules' · b58e9fd4

David S. Miller authored Dec 11, 2019

Russell King says:

====================
Add support for SFP+ copper modules

This series adds support for Copper SFP+ modules with Clause 45 PHYs.
Specifically the patches:

1. drop support for the probably never tested 100BASE-*X modules.
2. drop EEPROM ID from sfp_select_interface()
3. add more compliance code definitions from SFF-8024, renaming the
   existing definitions.
4. add module start/stop methods so phylink knows when a module is
   about to become active. The module start method is called after
   we have probed for a PHY on the module.
5. move start/stop of module PHY down into phylink using the new
   module start/stop methods.
6. add support for Clause 45 I2C accesses, tested with Methode DM7052.
   Other modules appear to use the same protocol, but slight
   differences, but I do not have those modules to test with.
   (if someone does, please holler!)
7. rearrange how we attach to PHYs so that we can support Clause 45
   PHYs with indeterminant interface modes.  (Clause 45 PHYs appear
   to like to change their PHY interface mode depending on the
   negotiated speed.)
8. add support for phylink to connect to a clause 45 PHY on a SFP
   module.
9. split the link_an_mode between the configured value and the
   currently selected mode value; some clause 45 PHYs have no
   capability to provide in-band negotiation.
10. split the link configuration on SFP module insertion in phylink
    so we can use it in other code paths.
11. delay MAC configuration for copper modules without a PHY to the
    module start method - after any module PHY has been probed.  If
    the module has a PHY, then we setup the MAC when the PHY is
    detected.
12. the Broadcom 84881 PHY does not support in-band negotiation even
    though it uses SGMII and 2500BASE-X.  Having the MAC operating
    with in-band negotiation enabled, even with AN bypass enabled,
    results in no link - Broadcom say that the host MAC must always
    be forced.
13. add support for the Broadcom 84881 PHY found on the Methode
    DM7052 module.
14. add support to SFP to probe for a Clause 45 PHY on copper SFP+
    modules.

v3: now bisectable!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b58e9fd4

net: sfp: add support for Clause 45 PHYs · 9a484621

Russell King authored Dec 11, 2019

Some SFP+ modules have a Clause 45 PHY onboard, which is accessible via
the normal I2C address.  Detect 10G BASE-T PHYs which may have an
accessible PHY and probe for it.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

9a484621

net: phy: add Broadcom BCM84881 PHY driver · 75f4d8d1

Russell King authored Dec 11, 2019

Add a rudimentary Clause 45 driver for the BCM84881 PHY, found on
Methode DM7052 SFPs.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

75f4d8d1

net: phylink: make Broadcom BCM84881 based SFPs work · 7adb5b21

Russell King authored Dec 11, 2019

The Broadcom BCM84881 does not appear to send the SGMII control word
when operating in SGMII mode, which causes network adapters to fail
to link with the PHY, or decide to operate at fixed 1G speed, even if
the PHY negotiated 100M.

Work around this by detecting the Broadcom BCM84881 and switch to phy
mode rather than inband mode.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

7adb5b21

net: phylink: delay MAC configuration for copper SFP modules · 52c95600

Russell King authored Dec 11, 2019

Knowing whether we need to delay the MAC configuration because a module
may have a PHY is useful to phylink to allow NBASE-T modules to work on
systems supporting no more than 2.5G speeds.

This commit allows us to delay such configuration until after the PHY
has been probed by recording the parsed capabilities, and if the module
may have a PHY, doing no more until the module_start() notification is
called.  At that point, we either have a PHY, or we don't.

We move the PHY-based setup a little later, and use the PHYs support
capabilities rather than the EEPROM parsed capabilities to determine
whether we can support the PHY.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

52c95600

net: phylink: split phylink_sfp_module_insert() · c0de2f47

Russell King authored Dec 11, 2019

Split out the configuration step from phylink_sfp_module_insert() so
we can re-use this later.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

c0de2f47

net: phylink: split link_an_mode configured and current settings · 24cf0e69

Russell King authored Dec 11, 2019

Split link_an_mode between the configured setting and the current
operating setting.  This is an important distinction to make when we
need to configure PHY mode for a plugged SFP+ module that does not
use in-band signalling.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

24cf0e69

net: phylink: support Clause 45 PHYs on SFP+ modules · e45d1f52

Russell King authored Dec 11, 2019

Some SFP+ modules have Clause 45 PHYs embedded on them, which need a
little more handling in order to ensure that they are correctly setup,
as they switch the PHY link mode according to the negotiated speed.

With Clause 22 PHYs, we assumed that they would operate in SGMII mode,
but this assumption is now false.  Adapt phylink to support Clause 45
PHYs on SFP+ modules.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

e45d1f52

net: phylink: re-split __phylink_connect_phy() · 938d44c2

Russell King authored Dec 11, 2019

In order to support Clause 45 PHYs on SFP+ modules, which have an
indeterminant phy interface mode, we need to be able to call
phylink_bringup_phy() with a different interface mode to that used when
binding the PHY. Reduce __phylink_connect_phy() to an attach operation,
and move the call to phylink_bringup_phy() to its call sites.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

938d44c2

net: mdio-i2c: add support for Clause 45 accesses · 6912b712

Russell King authored Dec 11, 2019

Some SFP+ modules have PHYs on them just like SFP modules do, except
they are Clause 45 PHYs.  The I2C protocol used to access them is
modified slightly in order to send the device address and 16-bit
register index.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

6912b712

net: sfp: move phy_start()/phy_stop() to phylink · 4882057a

Russell King authored Dec 11, 2019

Move phy_start() and phy_stop() into the module_start and module_stop
notifications in phylink, rather than having them in the SFP code.
This gives phylink responsibility for controlling the PHY, rather
than having SFP start and stop the PHY state machine.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

4882057a

net: sfp: add module start/stop upstream notifications · 74c551ca

Russell King authored Dec 11, 2019

When dealing with some copper modules, we can't positively know the
module capabilities are until we have probed the PHY. Without the full
capabilities, we may end up failing a module that we could otherwise
drive with a restricted set of capabilities.

An example of this would be a module with a NBASE-T PHY plugged into
a host that supports phy interface modes 2500BASE-X and SGMII. The
PHY supports 10GBASE-R, 5000BASE-X, 2500BASE-X, SGMII interface modes,
which means a subset of the capabilities are compatible with the host.

However, reading the module EEPROM leads us to believe that the module
only supports ethtool link mode 10GBASE-T, which is incompatible with
the host - and thus results in the module being rejected.

This patch adds an extra notification which are triggered after the
SFP module's PHY probe, and a corresponding notification just before
the PHY is removed.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

74c551ca

net: sfp: add more extended compliance codes · 0fbd26a9

Russell King authored Dec 11, 2019

SFF-8024 is used to define various constants re-used in several SFF
SFP-related specifications.  Split these constants from the enum, and
rename them to indicate that they're defined by SFF-8024.

Add and use updated SFF-8024 extended compliance code definitions for
10GBASE-T, 5GBASE-T and 2.5GBASE-T modules.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

0fbd26a9

net: sfp: derive interface mode from ethtool link modes · a4516c70

Russell King authored Dec 11, 2019

We don't need the EEPROM ID to derive the phy interface mode as we can
derive it merely from the ethtool link modes.  Remove the EEPROM ID
argument to sfp_select_interface().
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

a4516c70

net: sfp: remove incomplete 100BASE-FX and 100BASE-LX support · fa2de660

Russell King authored Dec 11, 2019

The 100BASE-FX and 100BASE-LX support assumes a PHY is present; this
is probably an incorrect assumption. In any case, sfp_parse_support()
will fail such a module. Let's stop pretending we support these
modules.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

fa2de660

cxgb4: add support for high priority filters · c2193999

Shahjada Abul Husain authored Dec 10, 2019

T6 has a separate region known as high priority filter region
that allows classifying packets going through ULD path. So,
query firmware for HPFILTER resources and enable the high
priority offload filter support when it is available.
Signed-off-by: Shahjada Abul Husain <shahjada@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c2193999

enetc: remove variable 'tc_max_sized_frame' set but not used · 6525b5ef

Chen Wandun authored Dec 10, 2019

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/ethernet/freescale/enetc/enetc_qos.c: In function enetc_setup_tc_cbs:
drivers/net/ethernet/freescale/enetc/enetc_qos.c:195:6: warning: variable tc_max_sized_frame set but not used [-Wunused-but-set-variable]

Fixes: c431047c ("enetc: add support Credit Based Shaper(CBS) for hardware offload")
Signed-off-by: Chen Wandun <chenwandun@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6525b5ef

nfp: add support for TLV device stats · ca866ee8

Jakub Kicinski authored Dec 09, 2019

Device stats are currently hard coded in the PCI BAR0 layout.
Add a ability to read them from the TLV area instead.
Names for the stats are maintained by the driver, and their
meaning documented. This allows us to more easily add and
remove device stats.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ca866ee8

tcp: Cleanup duplicate initialization of sk->sk_state. · 5000b28b

Kuniyuki Iwashima authored Dec 10, 2019

When a TCP socket is created, sk->sk_state is initialized twice as
TCP_CLOSE in sock_init_data() and tcp_init_sock(). The tcp_init_sock() is
always called after the sock_init_data(), so it is not necessary to update
sk->sk_state in the tcp_init_sock().

Before v2.1.8, the code of the two functions was in the inet_create(). In
the patch of v2.1.8, the tcp_v4/v6_init_sock() were added and the code of
initialization of sk->state was duplicated.
Signed-off-by: Kuniyuki Iwashima <kuni1840@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5000b28b

enetc: add software timestamping · 4caefbce

Michael Walle authored Dec 10, 2019

Provide a software TX timestamp and add it to the ethtool query
interface.

skb_tx_timestamp() is also needed if one would like to use PHY
timestamping.
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

4caefbce

Merge branch 'tipc-introduce-variable-window-congestion-control' · bb9d8454

David S. Miller authored Dec 10, 2019

Jon Maloy says:

====================
tipc: introduce variable window congestion control

We improve thoughput greatly by introducing a variety of the Reno
congestion control algorithm at the link level.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

bb9d8454

tipc: introduce variable window congestion control · 16ad3f40

Jon Maloy authored Dec 10, 2019

We introduce a simple variable window congestion control for links.
The algorithm is inspired by the Reno algorithm, covering both 'slow
start', 'congestion avoidance', and 'fast recovery' modes.

- We introduce hard lower and upper window limits per link, still
  different and configurable per bearer type.

- We introduce a 'slow start theshold' variable, initially set to
  the maximum window size.

- We let a link start at the minimum congestion window, i.e. in slow
  start mode, and then let is grow rapidly (+1 per rceived ACK) until
  it reaches the slow start threshold and enters congestion avoidance
  mode.

- In congestion avoidance mode we increment the congestion window for
  each window-size number of acked packets, up to a possible maximum
  equal to the configured maximum window.

- For each non-duplicate NACK received, we drop back to fast recovery
  mode, by setting the both the slow start threshold to and the
  congestion window to (current_congestion_window / 2).

- If the timeout handler finds that the transmit queue has not moved
  since the previous timeout, it drops the link back to slow start
  and forces a probe containing the last sent sequence number to the
  sent to the peer, so that this can discover the stale situation.

This change does in reality have effect only on unicast ethernet
transport, as we have seen that there is no room whatsoever for
increasing the window max size for the UDP bearer.
For now, we also choose to keep the limits for the broadcast link
unchanged and equal.

This algorithm seems to give a 50-100% throughput improvement for
messages larger than MTU.
Suggested-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

16ad3f40

tipc: eliminate more unnecessary nacks and retransmissions · d3b09995

Jon Maloy authored Dec 10, 2019

When we increase the link tranmsit window we often observe the following
scenario:

1) A STATE message bypasses a sequence of traffic packets and arrives
   far ahead of those to the receiver. STATE messages contain a
   'peers_nxt_snt' field to indicate which was the last packet sent
   from the peer. This mechanism is intended as a last resort for the
   receiver to detect missing packets, e.g., during very low traffic
   when there is no packet flow to help early loss detection.
3) The receiving link compares the 'peer_nxt_snt' field to its own
   'rcv_nxt', finds that there is a gap, and immediately sends a
   NACK message back to the peer.
4) When this NACKs arrives at the sender, all the requested
   retransmissions are performed, since it is a first-time request.

Just like in the scenario described in the previous commit this leads
to many redundant retransmissions, with decreased throughput as a
consequence.

We fix this by adding two more conditions before we send a NACK in
this sitution. First, the deferred queue must be empty, so we cannot
assume that the potential packet loss has already been detected by
other means. Second, we check the 'peers_snd_nxt' field only in probe/
probe_reply messages, thus turning this into a true mechanism of last
resort as it was really meant to be.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d3b09995

tipc: eliminate gap indicator from ACK messages · 02288248

Jon Maloy authored Dec 10, 2019

When we increase the link send window we sometimes observe the
following scenario:

1) A packet #N arrives out of order far ahead of a sequence of older
   packets which are still under way. The packet is added to the
   deferred queue.
2) The missing packets arrive in sequence, and for each 16th of them
   an ACK is sent back to the receiver, as it should be.
3) When building those ACK messages, it is checked if there is a gap
   between the link's 'rcv_nxt' and the first packet in the deferred
   queue. This is always the case until packet number #N-1 arrives, and
   a 'gap' indicator is added, effectively turning them into NACK
   messages.
4) When those NACKs arrive at the sender, all the requested
   retransmissions are done, since it is a first-time request.

This sometimes leads to a huge amount of redundant retransmissions,
causing a drop in max throughput. This problem gets worse when we
in a later commit introduce variable window congestion control,
since it drops the link back to 'fast recovery' much more often
than necessary.

We now fix this by not sending any 'gap' indicator in regular ACK
messages. We already have a mechanism for sending explicit NACKs
in place, and this is sufficient to keep up the packet flow.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

02288248

10 Dec, 2019 8 commits

ppp: Adjust indentation into ppp_async_input · 08cbc75f

Nathan Chancellor authored Dec 09, 2019

Clang warns:

../drivers/net/ppp/ppp_async.c:877:6: warning: misleading indentation;
statement is not part of the previous 'if' [-Wmisleading-indentation]
                                ap->rpkt = skb;
                                ^
../drivers/net/ppp/ppp_async.c:875:5: note: previous statement is here
                                if (!skb)
                                ^
1 warning generated.

This warning occurs because there is a space before the tab on this
line. Clean up this entire block's indentation so that it is consistent
with the Linux kernel coding style and clang no longer warns.

Fixes: 6722e78c ("[PPP]: handle misaligned accesses")
Link: https://github.com/ClangBuiltLinux/linux/issues/800Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

08cbc75f

net: smc911x: Adjust indentation in smc911x_phy_configure · 5c61e223

Nathan Chancellor authored Dec 09, 2019

Clang warns:

../drivers/net/ethernet/smsc/smc911x.c:939:3: warning: misleading
indentation; statement is not part of the previous 'if'
[-Wmisleading-indentation]
         if (!lp->ctl_rfduplx)
         ^
../drivers/net/ethernet/smsc/smc911x.c:936:2: note: previous statement
is here
        if (lp->ctl_rspeed != 100)
        ^
1 warning generated.

This warning occurs because there is a space after the tab on this line.
Remove it so that the indentation is consistent with the Linux kernel
coding style and clang no longer warns.

Fixes: 0a0c72c9 ("[PATCH] RE: [PATCH 1/1] net driver: Add support for SMSC LAN911x line of ethernet chips")
Link: https://github.com/ClangBuiltLinux/linux/issues/796Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5c61e223

net: tulip: Adjust indentation in {dmfe, uli526x}_init_module · fe06bf3d

Nathan Chancellor authored Dec 09, 2019

Clang warns:

../drivers/net/ethernet/dec/tulip/uli526x.c:1812:3: warning: misleading
indentation; statement is not part of the previous 'if'
[-Wmisleading-indentation]
        switch (mode) {
        ^
../drivers/net/ethernet/dec/tulip/uli526x.c:1809:2: note: previous
statement is here
        if (cr6set)
        ^
1 warning generated.

../drivers/net/ethernet/dec/tulip/dmfe.c:2217:3: warning: misleading
indentation; statement is not part of the previous 'if'
[-Wmisleading-indentation]
        switch(mode) {
        ^
../drivers/net/ethernet/dec/tulip/dmfe.c:2214:2: note: previous
statement is here
        if (cr6set)
        ^
1 warning generated.

This warning occurs because there is a space before the tab on these
lines. Remove them so that the indentation is consistent with the Linux
kernel coding style and clang no longer warns.

While we are here, adjust the default block in dmfe_init_module to have
a proper break between the label and assignment and add a space between
the switch and opening parentheses to avoid a checkpatch warning.

Fixes: e1c3e501 ("[PATCH] initialisation cleanup for ULI526x-net-driver")
Link: https://github.com/ClangBuiltLinux/linux/issues/795Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fe06bf3d

Merge branch 'dp83867-fix-fifo-depth' · 80bfc3b4

David S. Miller authored Dec 09, 2019

Dan Murphy says:

====================
Fix Tx/Rx FIFO depth for DP83867

The DP83867 supports both the RGMII and SGMII modes.  The Tx and Rx FIFO depths
are configurable in these modes but may not applicable for both modes.

When the device is configured for RGMII mode the Tx FIFO depth is applicable
and for SGMII mode both Tx and Rx FIFO depth settings are applicable.  When
the driver was originally written only the RGMII device was available and there
were no standard fifo-depth DT properties.

The patchset converts the special ti,fifo-depth property to the standard
tx-fifo-depth property while still allowing the ti,fifo-depth property to be
set as to maintain backward compatibility.

In addition to this change the rx-fifo-depth property support was added and only
written when the device is configured for SGMII mode.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

80bfc3b4

net: phy: dp83867: Add rx-fifo-depth and tx-fifo-depth · e02d1816

Dan Murphy authored Dec 09, 2019

This code changes the TI specific ti,fifo-depth to the common
tx-fifo-depth property.  The tx depth is applicable for both RGMII and
SGMII modes of operation.

rx-fifo-depth was added as well but this is only applicable for SGMII
mode.

So in summary
if RGMII mode write tx fifo depth only
if SGMII mode write both rx and tx fifo depths

If the property is not populated in the device tree then set the value
to the default values.
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Reported-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

e02d1816

dt-bindings: dp83867: Convert fifo-depth to common fifo-depth and make optional · 96ae38af

Dan Murphy authored Dec 09, 2019

Convert the ti,fifo-depth from a TI specific property to the common
tx-fifo-depth property.  Also add support for the rx-fifo-depth.

These are optional properties for this device and if these are not
available then the fifo depths are set to device default values.
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Reported-by: Adrian Bunk <bunk@kernel.org>
CC: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

96ae38af

net-tcp: Disable TCP ssthresh metrics cache by default · 65e6d901

Kevin(Yudong) Yang authored Dec 09, 2019

This patch introduces a sysctl knob "net.ipv4.tcp_no_ssthresh_metrics_save"
that disables TCP ssthresh metrics cache by default. Other parts of TCP
metrics cache, e.g. rtt, cwnd, remain unchanged.

As modern networks becoming more and more dynamic, TCP metrics cache
today often causes more harm than benefits. For example, the same IP
address is often shared by different subscribers behind NAT in residential
networks. Even if the IP address is not shared by different users,
caching the slow-start threshold of a previous short flow using loss-based
congestion control (e.g. cubic) often causes the future longer flows of
the same network path to exit slow-start prematurely with abysmal
throughput.

Caching ssthresh is very risky and can lead to terrible performance.
Therefore it makes sense to make disabling ssthresh caching by
default and opt-in for specific networks by the administrators.
This practice also has worked well for several years of deployment with
CUBIC congestion control at Google.
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Kevin(Yudong) Yang <yyd@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

65e6d901

sctp: get netns from asoc and ep base · 4e7696d9

Xin Long authored Dec 09, 2019

Commit 31243461 ("sctp: cache netns in sctp_ep_common") set netns
in asoc and ep base since they're created, and it will never change.
It's a better way to get netns from asoc and ep base, comparing to
calling sock_net().

This patch is to replace them.

v1->v2:
  - no change.
Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4e7696d9

09 Dec, 2019 5 commits

net: sfp: avoid tx-fault with Nokia GPON module · 26c97a2d

Russell King authored Dec 09, 2019

The Nokia GPON module can hold tx-fault active while it is initialising
which can take up to 60s. Avoid this causing the module to be declared
faulty after the SFP MSA defined non-cooled module timeout.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

26c97a2d

qed: remove redundant assignments to rc · e70ac628

Colin Ian King authored Dec 09, 2019

The variable rc is assigned with a value that is never read and
it is re-assigned a new value later on.  The assignment is redundant
and can be removed.  Clean up multiple occurrances of this pattern.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e70ac628

NFC: port100: Convert cpu_to_le16(le16_to_cpu(E1) + E2) to use le16_add_cpu(). · 718eae27

Mao Wenan authored Dec 09, 2019

Convert cpu_to_le16(le16_to_cpu(frame->datalen) + len) to
use le16_add_cpu(), which is more concise and does the same thing.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

718eae27

Merge branch 'for-upstream' of... · 4a63ef71

David S. Miller authored Dec 09, 2019

Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next

Johan Hedberg says:

====================
pull request: bluetooth-next 2019-12-09

Here's the first bluetooth-next pull request for 5.6:

 - Devicetree bindings updates for Broadcom controllers
 - Add support for PCM configuration for Broadcom controllers
 - btusb: Fixes for Realtek devices
 - butsb: A few other smaller fixes (mem leak & non-atomic allocation issue)

Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

4a63ef71

net: WireGuard secure network tunnel · e7096c13

Jason A. Donenfeld authored Dec 09, 2019

WireGuard is a layer 3 secure networking tunnel made specifically for
the kernel, that aims to be much simpler and easier to audit than IPsec.
Extensive documentation and description of the protocol and
considerations, along with formal proofs of the cryptography, are
available at:

  * https://www.wireguard.com/
  * https://www.wireguard.com/papers/wireguard.pdf

This commit implements WireGuard as a simple network device driver,
accessible in the usual RTNL way used by virtual network drivers. It
makes use of the udp_tunnel APIs, GRO, GSO, NAPI, and the usual set of
networking subsystem APIs. It has a somewhat novel multicore queueing
system designed for maximum throughput and minimal latency of encryption
operations, but it is implemented modestly using workqueues and NAPI.
Configuration is done via generic Netlink, and following a review from
the Netlink maintainer a year ago, several high profile userspace tools
have already implemented the API.

This commit also comes with several different tests, both in-kernel
tests and out-of-kernel tests based on network namespaces, taking profit
of the fact that sockets used by WireGuard intentionally stay in the
namespace the WireGuard interface was originally created, exactly like
the semantics of userspace tun devices. See wireguard.com/netns/ for
pictures and examples.

The source code is fairly short, but rather than combining everything
into a single file, WireGuard is developed as cleanly separable files,
making auditing and comprehension easier. Things are laid out as
follows:

  * noise.[ch], cookie.[ch], messages.h: These implement the bulk of the
    cryptographic aspects of the protocol, and are mostly data-only in
    nature, taking in buffers of bytes and spitting out buffers of
    bytes. They also handle reference counting for their various shared
    pieces of data, like keys and key lists.

  * ratelimiter.[ch]: Used as an integral part of cookie.[ch] for
    ratelimiting certain types of cryptographic operations in accordance
    with particular WireGuard semantics.

  * allowedips.[ch], peerlookup.[ch]: The main lookup structures of
    WireGuard, the former being trie-like with particular semantics, an
    integral part of the design of the protocol, and the latter just
    being nice helper functions around the various hashtables we use.

  * device.[ch]: Implementation of functions for the netdevice and for
    rtnl, responsible for maintaining the life of a given interface and
    wiring it up to the rest of WireGuard.

  * peer.[ch]: Each interface has a list of peers, with helper functions
    available here for creation, destruction, and reference counting.

  * socket.[ch]: Implementation of functions related to udp_socket and
    the general set of kernel socket APIs, for sending and receiving
    ciphertext UDP packets, and taking care of WireGuard-specific sticky
    socket routing semantics for the automatic roaming.

  * netlink.[ch]: Userspace API entry point for configuring WireGuard
    peers and devices. The API has been implemented by several userspace
    tools and network management utility, and the WireGuard project
    distributes the basic wg(8) tool.

  * queueing.[ch]: Shared function on the rx and tx path for handling
    the various queues used in the multicore algorithms.

  * send.c: Handles encrypting outgoing packets in parallel on
    multiple cores, before sending them in order on a single core, via
    workqueues and ring buffers. Also handles sending handshake and cookie
    messages as part of the protocol, in parallel.

  * receive.c: Handles decrypting incoming packets in parallel on
    multiple cores, before passing them off in order to be ingested via
    the rest of the networking subsystem with GRO via the typical NAPI
    poll function. Also handles receiving handshake and cookie messages
    as part of the protocol, in parallel.

  * timers.[ch]: Uses the timer wheel to implement protocol particular
    event timeouts, and gives a set of very simple event-driven entry
    point functions for callers.

  * main.c, version.h: Initialization and deinitialization of the module.

  * selftest/*.h: Runtime unit tests for some of the most security
    sensitive functions.

  * tools/testing/selftests/wireguard/netns.sh: Aforementioned testing
    script using network namespaces.

This commit aims to be as self-contained as possible, implementing
WireGuard as a standalone module not needing much special handling or
coordination from the network subsystem. I expect for future
optimizations to the network stack to positively improve WireGuard, and
vice-versa, but for the time being, this exists as intentionally
standalone.

We introduce a menu option for CONFIG_WIREGUARD, as well as providing a
verbose debug log and self-tests via CONFIG_WIREGUARD_DEBUG.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: David Miller <davem@davemloft.net>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

e7096c13

08 Dec, 2019 3 commits

Linux 5.5-rc1 · e42617b8
Linus Torvalds authored Dec 08, 2019

e42617b8

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 95e6ba51

Linus Torvalds authored Dec 08, 2019

Pull networking fixes from David Miller:

 1) More jumbo frame fixes in r8169, from Heiner Kallweit.

 2) Fix bpf build in minimal configuration, from Alexei Starovoitov.

 3) Use after free in slcan driver, from Jouni Hogander.

 4) Flower classifier port ranges don't work properly in the HW offload
    case, from Yoshiki Komachi.

 5) Use after free in hns3_nic_maybe_stop_tx(), from Yunsheng Lin.

 6) Out of bounds access in mqprio_dump(), from Vladyslav Tarasiuk.

 7) Fix flow dissection in dsa TX path, from Alexander Lobakin.

 8) Stale syncookie timestampe fixes from Guillaume Nault.

[ Did an evil merge to silence a warning introduced by this pull - Linus ]

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits)
  r8169: fix rtl_hw_jumbo_disable for RTL8168evl
  net_sched: validate TCA_KIND attribute in tc_chain_tmplt_add()
  r8169: add missing RX enabling for WoL on RTL8125
  vhost/vsock: accept only packets with the right dst_cid
  net: phy: dp83867: fix hfs boot in rgmii mode
  net: ethernet: ti: cpsw: fix extra rx interrupt
  inet: protect against too small mtu values.
  gre: refetch erspan header from skb->data after pskb_may_pull()
  pppoe: remove redundant BUG_ON() check in pppoe_pernet
  tcp: Protect accesses to .ts_recent_stamp with {READ,WRITE}_ONCE()
  tcp: tighten acceptance of ACKs not matching a child socket
  tcp: fix rejected syncookies due to stale timestamps
  lpc_eth: kernel BUG on remove
  tcp: md5: fix potential overestimation of TCP option space
  net: sched: allow indirect blocks to bind to clsact in TC
  net: core: rename indirect block ingress cb function
  net-sysfs: Call dev_hold always in netdev_queue_add_kobject
  net: dsa: fix flow dissection on Tx path
  net/tls: Fix return values to avoid ENOTSUPP
  net: avoid an indirect call in ____sys_recvmsg()
  ...

95e6ba51

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 138f371d

Linus Torvalds authored Dec 08, 2019

Pull more SCSI updates from James Bottomley:
 "Eleven patches, all in drivers (no core changes) that are either minor
  cleanups or small fixes.

  They were late arriving, but still safe for -rc1"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: MAINTAINERS: Add the linux-scsi mailing list to the ISCSI entry
  scsi: megaraid_sas: Make poll_aen_lock static
  scsi: sd_zbc: Improve report zones error printout
  scsi: qla2xxx: Fix qla2x00_request_irqs() for MSI
  scsi: qla2xxx: unregister ports after GPN_FT failure
  scsi: qla2xxx: fix rports not being mark as lost in sync fabric scan
  scsi: pm80xx: Remove unused include of linux/version.h
  scsi: pm80xx: fix logic to break out of loop when register value is 2 or 3
  scsi: scsi_transport_sas: Fix memory leak when removing devices
  scsi: lpfc: size cpu map by last cpu id set
  scsi: ibmvscsi_tgt: Remove unneeded variable rc

138f371d