Commits · b784e7ebfce8cfb16c6f95e14e8532d0768ab7ff · nexedi / linux

06 Apr, 2017 37 commits

L2TP:Adjust intf MTU, add underlay L3, L2 hdrs. · b784e7eb

R. Parameswaran authored Apr 05, 2017

Existing L2TP kernel code does not derive the optimal MTU for Ethernet
pseudowires and instead leaves this to a userspace L2TP daemon or
operator. If an MTU is not specified, the existing kernel code chooses
an MTU that does not take account of all tunnel header overheads, which
can lead to unwanted IP fragmentation. When L2TP is used without a
control plane (userspace daemon), we would prefer that the kernel does a
better job of choosing a default pseudowire MTU, taking account of all
tunnel header overheads, including IP header options, if any. This patch
addresses this.

Change-set here uses the new kernel function, kernel_sock_ip_overhead(),
to factor the outer IP overhead on the L2TP tunnel socket (including
IP Options, if any) when calculating the default MTU for an Ethernet
pseudowire, along with consideration of the inner Ethernet header.
Signed-off-by: R. Parameswaran <rparames@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b784e7eb

New kernel function to get IP overhead on a socket. · 113c3075

R. Parameswaran authored Apr 05, 2017

A new function, kernel_sock_ip_overhead(), is provided
to calculate the cumulative overhead imposed by the IP
Header and IP options, if any, on a socket's payload.
The new function returns an overhead of zero for sockets
that do not belong to the IPv4 or IPv6 address families.
This is used in the L2TP code path to compute the
total outer IP overhead on the L2TP tunnel socket when
calculating the default MTU for Ethernet pseudowires.
Signed-off-by: R. Parameswaran <rparames@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

113c3075

net: ethernet: wiznet: avoid format string exposure · 129858fa

Kees Cook authored Apr 05, 2017

While unlikely, this makes sure any format strings in the device name
can't exposure information via the resulting workqueue name.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

129858fa

qlge: avoid format string exposure in workqueue · df656bf6

Kees Cook authored Apr 05, 2017

While unlikely, this makes sure the workqueue name won't be processed
as a format string.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

df656bf6

qed: Correct MSI-x for storage · 2f782278

Mintz, Yuval authored Apr 05, 2017

When qedr is enabled, qed would try dividing the msi-x vectors between
L2 and RoCE, starting with L2 and providing it with sufficient vectors
for its queues.

Problem is qed would also do that for storage partitions, and as those
don't need queues it would lead qed to award those partitions with 0
msi-x vectors, causing them to believe theye're using INTa and
preventing them from operating.

Fixes: 51ff1725 ("qed: Add support for RoCE hw init")
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2f782278

Merge branch 'dsa-loop-fixes' · 9e9623ab

David S. Miller authored Apr 06, 2017

Florian Fainelli says:

====================
net: dsa: Mock-up driver couple fixes

Thanks to Dan's static checker, a bunch of small issues were found in the code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

9e9623ab

net: dsa: loop: Initialize err in dsa_loop_vlan_dump · d1db799e

Florian Fainelli authored Apr 05, 2017

Dan's static checker reported the following:

	drivers/net/dsa/dsa_loop.c:223 dsa_loop_port_vlan_dump()
	error: uninitialized symbol 'err'.

which could happen if we do hit the continue statement for each iteration of
the loop. Initialize err to 0 here.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 98cd1552 ("net: dsa: Mock-up driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d1db799e

net: dsa: loop: Fix uninitialized pvid variable · 5865ccce

Florian Fainelli authored Apr 05, 2017

Dan's static analyzer reported the following:

	drivers/net/dsa/dsa_loop.c:181 dsa_loop_port_vlan_del()
	error: XXX uninitialized symbol 'pvid'.

we were missing the assignment of pvid to ps->vid, so add that.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 98cd1552 ("net: dsa: Mock-up driver")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5865ccce

net/sched: Removed unused vlan actions definition · 91e91bef

Or Gerlitz authored Apr 05, 2017

Commit c7e2b968 "sched: introduce vlan action" added both the
UAPI values for the vlan actions (TCA_VLAN_ACT_) and these two
in-kernel ones which are not used, remove them.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

91e91bef

mlx4: trust shinfo->gso_segs · 75d04aa3

Eric Dumazet authored Apr 05, 2017

mlx4 is the only driver in the tree making a point to recompute
shinfo->gso_segs.

Lets remove superfluous code.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

75d04aa3

qed: fix missing break in OOO_LB_TC case · 827d240a

Colin Ian King authored Apr 05, 2017

There seems to be a missing break on the OOO_LB_TC case, pq_id
is being assigned and then re-assigned on the fall through default
case and that seems suspect.

Detected by CoverityScan, CID#1424402 ("Missing break in switch")

Fixes: b5a9ee7c ("qed: Revise QM cofiguration")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

827d240a

net/mlx5e: fix build error without CONFIG_SYSFS · 053ee0a7

Tobias Regnery authored Apr 05, 2017

Commit 9008ae07 ("net/mlx5e: Minimize mlx5e_{open/close}_locked")
copied the calls to netif_set_real_num_{tx,rx}_queues from
mlx5e_open_locked to mlx5e_activate_priv_channels and wraps them in an
if condition to test for netdev->real_num_{tx,rx}_queues.

But netdev->real_num_rx_queues is conditionally compiled in if CONFIG_SYSFS
is set. Without CONFIG_SYSFS the build fails:

drivers/net/ethernet/mellanox/mlx5/core/en_main.c: In function 'mlx5e_activate_priv_channels':
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2515:12: error: 'struct net_device' has no member named 'real_num_rx_queues'; did you mean 'real_num_tx_queues'?

Fix this by unconditionally call netif_set_real_num{tx,rx}_queues like before
commit 9008ae07.

Fixes: 9008ae07 ("net/mlx5e: Minimize mlx5e_{open/close}_locked")
Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

053ee0a7

af_unix: Use designated initializers · 82fe0d2b

Kees Cook authored Apr 04, 2017

Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, and the initializer fixes
were extracted from grsecurity. In this case, NULL initialize with { }
instead of undesignated NULLs.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

82fe0d2b

Merge branch 'ftgmac100-rework-batch-1-Link-and-Interrupts' · 540c86f3

David S. Miller authored Apr 06, 2017

Benjamin Herrenschmidt says:

====================
ftgmac100: Rework batch 1 - Link & Interrupts

This is version 2 of the first batch of updates to the
ftgmac100 driver.

Essentially:

 - A few misc cleanups
 - Fixing link speed & duplex handling (including dealing with
   an Aspeed requirement to double reset the controller when
   the speed changes)
 - And addition of a reset task workqueue which will be used
   for delaying the re-initialization of the controller
 - Fixing a number of issues with how interrupts and NAPI
   are dealt with.

Subsequent batches will rework and improve the rx path, the
tx path, and add a bunch of features and fixes.

Version 2 addresses some review comments to patches 5 and 10
(see version history in the respective emails).
====================
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

540c86f3

ftgmac100: Rework NAPI & interrupts handling · 10cbd640

Benjamin Herrenschmidt authored Apr 05, 2017

First, don't look at the interrupt status in the poll loop
to decide what to poll. It's wrong. If we have run out of
budget, we may still have RX packets to unqueue but no more
RX interrupt pending.

So instead move the code looking at the interrupt status
into the interrupt handler where it belongs. That avoids a slow
MMIO read in the NAPI fast path. We keep the abnormal interrupts
enabled while NAPI is scheduled.

While at it, actually do something useful in the "error" cases:

On AHB bus error, trigger the new reset task, that's about all
we can do. On RX packet fifo or descriptor overflows, we need
to restart the MAC after having freed things up. So set a flag
that NAPI will see and use to perform that restart after
harvesting the RX ring.

Finally, we shouldn't complete NAPI if there are still outgoing
packets that will need harvesting. Waiting for more interrupts
is less efficient than letting NAPI run a while longer while
the queue drains.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

10cbd640

ftgmac100: Remove useless tests in interrupt handler · 9b86785d

Benjamin Herrenschmidt authored Apr 05, 2017

The interrupt is neither enabled nor registered when the interface
isn't running (regardless of whether we use nc-si or not) so the
test isn't useful.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

9b86785d

ftgmac100: Rework MAC reset and init · 874b55bf

Benjamin Herrenschmidt authored Apr 05, 2017

The HW requires a full MAC reset when changing the speed.

Additionally the Aspeed documentation spells out that the
MAC needs to be reset twice with a 10us interval.

We thus move the speed setting and top level reset code
into a new ftgmac100_reset_and_config_mac() function which
handles both. Move the ring pointers initialization there
too in order to reflect the HW change.

Also reduce the timeout for the MAC reset as it shouldn't
take more than 300 clock cycles according to the doc.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

874b55bf

ftgmac100: Add a reset task and use it for link changes · 855944ce

Benjamin Herrenschmidt authored Apr 05, 2017

Link speed changes require a full HW reset. This isn't done
properly at the moment. It will involve delays and thus isn't
suitable to do from the link poll callback.

So let's create a reset_task that we can queue up when the
link changes. It will be useful for various cases of error
handling as well.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

855944ce

ftgmac100: Move the bulk of inits to a separate function · da40d9d4

Benjamin Herrenschmidt authored Apr 05, 2017

The link monitoring and error handling code will have to
redo the ring inits and HW setup so move the code out of
ftgmac100_open() into a dedicated function.

This forces a bit of re-ordering of ftgmac100_open() but
nothing dramatic.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

da40d9d4

ftgmac100: Request the interrupt only after HW is reset · 81f1eca6

Benjamin Herrenschmidt authored Apr 05, 2017

The interrupt isn't shared, so this will keep it masked
until we have the HW in a known sane state.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

81f1eca6

ftgmac100: Move napi_add/del to open/close · b8dbecff

Benjamin Herrenschmidt authored Apr 05, 2017

Rather than probe/remove
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

b8dbecff

ftgmac100: Split ring alloc, init and rx buffer alloc · 87d18757

Benjamin Herrenschmidt authored Apr 05, 2017

Currently, a single function is used to allocate the rings
themselves, initialize them, populate the rx ring, and
allocate the rx buffers. The same happens on free.

This splits them into separate functions. This will be
useful when properly implementing re-initialization on
link changes and error handling when the rings will be
repopulated but not freed.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

87d18757

ftgmac100: Cleanup speed/duplex tracking and fix duplex config · 51764777

Benjamin Herrenschmidt authored Apr 05, 2017

Keep track of both the current speed and duplex settings
instead of only speed and properly apply the duplex setting
to the HW.

This reworks the adjust_link() function to also avoid trying
to reconfigure the HW when there is no link and to display
the link state to the user.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

51764777

ftgmac100: Remove "enabled" flags · 8396e1cb

Benjamin Herrenschmidt authored Apr 05, 2017

It's not used in any meaningful way
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

8396e1cb

ftgmac100: Reorder struct fields and comment · 831fb338

Benjamin Herrenschmidt authored Apr 05, 2017

Reorder the fields in struct ftgmac in slightly more logical
groups. Will make more sense as I add/remove some.

No code change.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

831fb338

ftgmac100: Remove "banner" comments · 3f6af0ee

Benjamin Herrenschmidt authored Apr 05, 2017

The divisions they represent are not particularily meaningful
and things are going to be moving around with upcoming changes
making these comments more a burden than anything else.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

3f6af0ee

ftgmac100: Use netdev->irq instead of private copy · 60b28a11

Benjamin Herrenschmidt authored Apr 05, 2017

There's a placeholder already for the irq, use it
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

60b28a11

liquidio: fix Octeon core watchdog timeout false alarm · bb54be58

Felix Manlunas authored Apr 04, 2017

Detection of watchdog timeout of Octeon cores is flawed and susceptible to
false alarms.  Refactor by removing the detection code, and in its place,
leverage existing code that monitors for an indication from the NIC
firmware that an Octeon core crashed; expand the meaning of the indication
to "an Octeon core crashed or its watchdog timer expired".  Detection of
watchdog timeout is now delegated to an exception handler in the NIC
firmware; this is free of false alarms.

Also if there's an Octeon core crash or watchdog timeout:
(1) Disable VF Ethernet links.
(2) Decrement the module refcount by an amount equal to the number of
    active VFs of the NIC whose Octeon core crashed or had a watchdog
    timeout.  The refcount will continue to reflect the active VFs of
    other liquidio NIC(s) (if present) whose Octeon cores are faultless.

Item (2) is needed to avoid the case of not being able to unload the driver
because the module refcount is stuck at some non-zero number.  There is
code that, in normal cases, decrements the refcount upon receiving a
message from the firmware that a VF driver was unloaded.  But in
exceptional cases like an Octeon core crash or watchdog timeout, arrival of
that particular message from the firmware might be unreliable.  That normal
case code is changed to not touch the refcount in the exceptional case to
avoid contention (over the refcount) with the liquidio_watchdog kernel
thread who will carry out item (2).
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bb54be58

net: usbnet: Remove unused driver_name variable · b73b3cde

Florian Fainelli authored Apr 04, 2017

With GCC 6.3, we can get the following warning:

drivers/net/usb/usbnet.c:85:19: warning: 'driver_name' defined but not
used [-Wunused-const-variable=]
 static const char driver_name [] = "usbnet";
                   ^~~~~~~~~~~
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b73b3cde

selftests/bpf: fix merge conflict · 89c0a361

Alexei Starovoitov authored Apr 06, 2017

fix artifact of merge resolution
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

89c0a361

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 6f14f443

David S. Miller authored Apr 06, 2017

Mostly simple cases of overlapping changes (adding code nearby,
a function whose name changes, for example).
Signed-off-by: David S. Miller <davem@davemloft.net>

6f14f443

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · ea6b1720

Linus Torvalds authored Apr 05, 2017

Pull networking fixes from David Miller:

 1) Reject invalid updates to netfilter expectation policies, from Pablo
    Neira Ayuso.

 2) Fix memory leak in nfnl_cthelper, from Jeffy Chen.

 3) Don't do stupid things if we get a neigh_probe() on a neigh entry
    whose ops lack a solicit method. From Eric Dumazet.

 4) Don't transmit packets in r8152 driver when the carrier is off, from
    Hayes Wang.

 5) Fix ipv6 packet type detection in aquantia driver, from Pavel
    Belous.

 6) Don't write uninitialized data into hw registers in bna driver, from
    Arnd Bergmann.

 7) Fix locking in ping_unhash(), from Eric Dumazet.

 8) Make BPF verifier range checks able to understand certain sequences
    emitted by LLVM, from Alexei Starovoitov.

 9) Fix use after free in ipconfig, from Mark Rutland.

10) Fix refcount leak on force commit in openvswitch, from Jarno
    Rajahalme.

11) Fix various overflow checks in AF_PACKET, from Andrey Konovalov.

12) Fix endianness bug in be2net driver, from Suresh Reddy.

13) Don't forget to wake TX queues when processing a timeout, from
    Grygorii Strashko.

14) ARP header on-stack storage is wrong in flow dissector, from Simon
    Horman.

15) Lost retransmit and reordering SNMP stats in TCP can be
    underreported. From Yuchung Cheng.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (82 commits)
  nfp: fix potential use after free on xdp prog
  tcp: fix reordering SNMP under-counting
  tcp: fix lost retransmit SNMP under-counting
  sctp: get sock from transport in sctp_transport_update_pmtu
  net: ethernet: ti: cpsw: fix race condition during open()
  l2tp: fix PPP pseudo-wire auto-loading
  bnx2x: fix spelling mistake in macros HW_INTERRUT_ASSERT_SET_*
  l2tp: take reference on sessions being dumped
  tcp: minimize false-positives on TCP/GRO check
  sctp: check for dst and pathmtu update in sctp_packet_config
  flow dissector: correct size of storage for ARP
  net: ethernet: ti: cpsw: wake tx queues on ndo_tx_timeout
  l2tp: take a reference on sessions used in genetlink handlers
  l2tp: hold session while sending creation notifications
  l2tp: fix duplicate session creation
  l2tp: ensure session can't get removed during pppol2tp_session_ioctl()
  l2tp: fix race in l2tp_recv_common()
  sctp: use right in and out stream cnt
  bpf: add various verifier test cases for self-tests
  bpf, verifier: fix rejection of unaligned access checks for map_value_adj
  ...

ea6b1720

nfp: fix potential use after free on xdp prog · c383bdd1

Jakub Kicinski authored Apr 04, 2017

We should unregister the net_device first, before we give back
our reference on xdp_prog.  Otherwise xdp_prog may be freed
before .ndo_stop() disabled the datapath.  Found by code inspection.

Fixes: ecd63a02 ("nfp: add XDP support in the driver")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c383bdd1

bonding: attempt to better support longer hw addresses · faeeb317

Jarod Wilson authored Apr 04, 2017

People are using bonding over Infiniband IPoIB connections, and who knows
what else. Infiniband has a hardware address length of 20 octets
(INFINIBAND_ALEN), and the network core defines a MAX_ADDR_LEN of 32.
Various places in the bonding code are currently hard-wired to 6 octets
(ETH_ALEN), such as the 3ad code, which I've left untouched here. Besides,
only alb is currently possible on Infiniband links right now anyway, due
to commit 1533e773, so the alb code is where most of the changes are.

One major component of this change is the addition of a bond_hw_addr_copy
function that takes a length argument, instead of using ether_addr_copy
everywhere that hardware addresses need to be copied about. The other
major component of this change is converting the bonding code from using
struct sockaddr for address storage to struct sockaddr_storage, as the
former has an address storage space of only 14, while the latter is 128
minus a few, which is necessary to support bonding over device with up to
MAX_ADDR_LEN octet hardware addresses. Additionally, this probably fixes
up some memory corruption issues with the current code, where it's
possible to write an infiniband hardware address into a sockaddr declared
on the stack.

Lightly tested on a dual mlx4 IPoIB setup, which properly shows a 20-octet
hardware address now:

$ cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active)
Primary Slave: mlx4_ib0 (primary_reselect always)
Currently Active Slave: mlx4_ib0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 100
Down Delay (ms): 100

Slave Interface: mlx4_ib0
MII Status: up
Speed: Unknown
Duplex: Unknown
Link Failure Count: 0
Permanent HW addr:
80:00:02:08:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:1d:67:01
Slave queue ID: 0

Slave Interface: mlx4_ib1
MII Status: up
Speed: Unknown
Duplex: Unknown
Link Failure Count: 0
Permanent HW addr:
80:00:02:09:fe:80:00:00:00:00:00:01:e4:1d:2d:03:00:1d:67:02
Slave queue ID: 0

Also tested with a standard 1Gbps NIC bonding setup (with a mix of
e1000 and e1000e cards), running LNST's bonding tests.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

faeeb317

tcp: fix reordering SNMP under-counting · 2d2517ee

Yuchung Cheng authored Apr 04, 2017

Currently the reordering SNMP counters only increase if a connection
sees a higher degree then it has previously seen. It ignores if the
reordering degree is not greater than the default system threshold.
This significantly under-counts the number of reordering events
and falsely convey that reordering is rare on the network.

This patch properly and faithfully records the number of reordering
events detected by the TCP stack, just like the comment says "this
exciting event is worth to be remembered". Note that even so TCP
still under-estimate the actual reordering events because TCP
requires TS options or certain packet sequences to detect reordering
(i.e. ACKing never-retransmitted sequence in recovery or disordered
 state).
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2d2517ee

tcp: fix lost retransmit SNMP under-counting · ecde8f36

Yuchung Cheng authored Apr 04, 2017

The lost retransmit SNMP stat is under-counting retransmission
that uses segment offloading. This patch fixes that so all
retransmission related SNMP counters are consistent.

Fixes: 10d3be56 ("tcp-tso: do not split TSO packets at retransmit time")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ecde8f36

sfc: don't insert mc_list on low-latency firmware if it's too long · 148cbab6

Edward Cree authored Apr 04, 2017

If the mc_list is longer than 256 addresses, we enter mc_promisc mode.
If we're in mc_promisc mode and the firmware doesn't support cascaded
 multicast, normally we also insert our mc_list, to prevent stealing by
 another VI.  However, if the mc_list was too long, this isn't really
 helpful - the MC groups that didn't fit in the list can still get
 stolen, and having only some of them stealable will probably cause
 more confusing behaviour than having them all stealable.  Since
 inserting 256 multicast filters takes a long time and can lead to MCDI
 state machine timeouts, just skip the mc_list insert in this overflow
 condition.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

148cbab6

05 Apr, 2017 3 commits

Merge branch 'nfp-ksettings' · a4b7c07f

David S. Miller authored Apr 05, 2017

Jakub Kicinski says:

====================
nfp: ethtool link settings

This series adds support for getting and setting link settings
via the (moderately) new ethtool ksettings ops.

First patch introduces minimal speed and duplex reporting using
the information directly provided in PCI BAR0 memory.

Next few changes deal with the need to refresh port state read
from the service process and patch 6 finally uses that information
to provide link speed and duplex.  Patches 7 and 8 add auto
negotiation and port type reporting.

Remaining changes provide the set support for speed and auto
negotiation.  An upcoming series will also add port splitting
support via devlink.

Quite a bit of churn in this series is caused by the fact that
currently port speed and split changes will usually require a
reboot to take effect.  Current service process code is not capable
of performing MAC reinitialization after chip has been passing
traffic.  To make sure user is aware of this limitation we refuse
the configuration unless netdev is down, print warning to the logs
and if configuration was performed but did take effect we unregister
the netdev.  Service process has a "reboot needed" sticky bit, so
reloading the driver will not bring the netdev back.

Note that there is a helper in patch 13 which is marked as
__always_inline, because the FIELD_* macros require the parameters
to be known at compilation time.  I hope that is OK.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

a4b7c07f

nfp: add support for .set_link_ksettings() · 7c698737

Jakub Kicinski authored Apr 04, 2017

Support setting link speed and autonegotiation through
set_link_ksettings() ethtool op.  If the port is reconfigured
in incompatible way and reboot is required the netdev will get
unregistered and not come back until user reboots the system.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7c698737

nfp: NSP backend for link configuration operations · 5a560832

Jakub Kicinski authored Apr 04, 2017

Add NSP backend for upcoming link configuration operations.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5a560832