Commits · 263726053400b9c6671df8e87d3db9728199da13 · Kirill Smelkov / linux

06 Dec, 2018 33 commits

net: core: dev: Add call_netdevice_notifiers_extack() · 26372605

Petr Machata authored Dec 06, 2018

In order to propagate extack through NETDEV_PRE_UP, add a new function
call_netdevice_notifiers_extack() that primes the extack field of the
notifier info. Convert call_netdevice_notifiers() to a simple wrapper
around the new function that passes NULL for extack.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

26372605

net: core: dev: Add extack argument to __dev_change_flags() · 6d040321

Petr Machata authored Dec 06, 2018

In order to pass extack together with NETDEV_PRE_UP notifications, it's
necessary to route the extack to __dev_open() from diverse (possibly
indirect) callers. The last missing API is __dev_change_flags().

Therefore extend __dev_change_flags() with and extra extack argument and
update the two existing users.

Since the function declaration line is changed anyway, name the struct
net_device argument to placate checkpatch.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6d040321

net: core: dev: Add extack argument to dev_change_flags() · 567c5e13

Petr Machata authored Dec 06, 2018

In order to pass extack together with NETDEV_PRE_UP notifications, it's
necessary to route the extack to __dev_open() from diverse (possibly
indirect) callers. One prominent API through which the notification is
invoked is dev_change_flags().

Therefore extend dev_change_flags() with and extra extack argument and
update all users. Most of the calls end up just encoding NULL, but
several sites (VLAN, ipvlan, VRF, rtnetlink) do have extack available.

Since the function declaration line is changed anyway, name the other
function arguments to placate checkpatch.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

567c5e13

net: ipvlan: ipvlan_set_port_mode(): Add an extack argument · cf7686a0

Petr Machata authored Dec 06, 2018

A follow-up patch will extend dev_change_flags() with an extack
argument. Extend ipvlan_set_port_mode() to have that argument available
for the conversion.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cf7686a0

net: vrf: cycle_netdev(): Add an extack argument · dc1aea1e

Petr Machata authored Dec 06, 2018

A follow-up patch will extend dev_change_flags() with an extack
argument. Extend cycle_netdev() to have that argument available for the
conversion.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dc1aea1e

net: core: dev: Add extack argument to dev_open() · 00f54e68

Petr Machata authored Dec 06, 2018

In order to pass extack together with NETDEV_PRE_UP notifications, it's
necessary to route the extack to __dev_open() from diverse (possibly
indirect) callers. One prominent API through which the notification is
invoked is dev_open().

Therefore extend dev_open() with and extra extack argument and update
all users. Most of the calls end up just encoding NULL, but bond and
team drivers have the extack readily available.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

00f54e68

tcp: fix code style in tcp_recvmsg() · fdb8b298

Pedro Tammela authored Dec 06, 2018

2 goto labels are indented with a tab. remove the tabs and
keep the code style consistent.
Signed-off-by: Pedro Tammela <pctammela@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fdb8b298

Merge branch 'dsa-mtu' · d6a4b570

David S. Miller authored Dec 06, 2018

Andrew Lunn says:

====================
Adjust MTU of DSA master interface

DSA makes use of additional headers to direct a frame in/out of a
specific port of the switch. When the slave interfaces uses an MTU of
1500, the master interface can be asked to handle frames with an MTU
of 1504, or 1508 bytes. Some Ethernet interfaces won't
transmit/receive frames which are bigger than their MTU.

Automate the increasing of the MTU on the master interface, by adding
to each tagging driver how much overhead they need, and then calling
dev_set_mtu() of the master interface to increase its MTU as needed.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

d6a4b570

net: dsa: Set the master device's MTU to account for DSA overheads · dc0fe7d4

Andrew Lunn authored Dec 06, 2018

DSA tagging of frames sent over the master interface to the switch
increases the size of the frame. Such frames can then be bigger than
the normal MTU of the master interface, and it may drop them. Use the
overhead information from the tagger to set the MTU of the master
device to include this overhead.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

dc0fe7d4

net: dsa: Add overhead to tag protocol ops. · a5dd3087

Andrew Lunn authored Dec 06, 2018

Each DSA tag protocol needs to add additional headers to the Ethernet
frame in order to direct it towards a specific switch egress port. It
must also remove the head from a frame received from a
switch. Indicate the maximum size of these headers in the tag protocol
ops structure, so the core can take these overheads into account.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

a5dd3087

tun: remove unnecessary check in tun_flow_update · 5c327f67

Li RongQing authored Dec 06, 2018

caller has guaranted that rxhash is not zero
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5c327f67

tun: align write-heavy flow entry members to a cache line · 83b1bc12

Li RongQing authored Dec 06, 2018

tun flow entry 'updated' fields are written when receive
every packet. Thus if a flow is receiving packets from a
particular flow entry, it'll cause false-sharing with
all the other who has looked it up, so move it in its own
cache line

and update 'queue_index' and 'update' field only when
they are changed to reduce the cache false-sharing.
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Wang Li <wangli39@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

83b1bc12

neighbor: Add extack messages for add and delete commands · 7a35a50d

David Ahern authored Dec 05, 2018

Add extack messages for failures in neigh_add and neigh_delete.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7a35a50d

tipc: fix node keep alive interval calculation · f5d6c3e5

Hoang Le authored Dec 06, 2018

When setting LINK tolerance, node timer interval will be calculated
base on the LINK with lowest tolerance.

But when calculated, the old node timer interval only updated if current
setting value (tolerance/4) less than old ones regardless of number of
links as well as links' lowest tolerance value.

This caused to two cases missing if tolerance changed as following:
Case 1:
1.1/ There is one link (L1) available in the system
1.2/ Set L1's tolerance from 1500ms => lower (i.e 500ms)
1.3/ Then, fallback to default (1500ms) or higher (i.e 2000ms)

Expected:
    node timer interval is 1500/4=375ms after 1.3

Result:
node timer interval will not being updated after changing tolerance at 1.3
since its value 1500/4=375ms is not less than 500/4=125ms at 1.2.

Case 2:
2.1/ There are two links (L1, L2) available in the system
2.2/ L1 and L2 tolerance value are 2000ms as initial
2.3/ Set L2's tolerance from 2000ms => lower 1500ms
2.4/ Disable link L2 (bring down its bearer)

Expected:
    node timer interval is 2000ms/4=500ms after 2.4

Result:
node timer interval will not being updated after disabling L2 since
its value 2000ms/4=500ms is still not less than 1500/4=375ms at 2.3
although L2 is already not available in the system.

To fix this, we start the node interval calculation by initializing it to
a value larger than any conceivable calculated value. This way, the link
with the lowest tolerance will always determine the calculated value.
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

f5d6c3e5

net: Use of_node_name_eq for node name comparisons · bf5849f1

Rob Herring authored Dec 05, 2018

Convert string compares of DT node names to use of_node_name_eq helper
instead. This removes direct access to the node name pointer.

For instances using of_node_cmp, this has the side effect of now using
case sensitive comparisons. This should not matter for any FDT based
system which all of these are.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Claudiu Manoil <claudiu.manoil@nxp.com>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Wingman Kwok <w-kwok2@ti.com>
Cc: Murali Karicheri <m-karicheri2@ti.com>
Cc: netdev@vger.kernel.org
Cc: linux-omap@vger.kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bf5849f1

net: netem: use a list in addition to rbtree · d66280b1

Peter Oskolkov authored Dec 04, 2018

When testing high-bandwidth TCP streams with large windows,
high latency, and low jitter, netem consumes a lot of CPU cycles
doing rbtree rebalancing.

This patch uses a linear list/queue in addition to the rbtree:
if an incoming packet is past the tail of the linear queue, it is
added there, otherwise it is inserted into the rbtree.

Without this patch, perf shows netem_enqueue, netem_dequeue,
and rb_* functions among the top offenders. With this patch,
only netem_enqueue is noticeable if jitter is low/absent.
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Peter Oskolkov <posk@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d66280b1

Merge branch 'net-bridge-convert-multicast-to-generic-rhashtable' · 932c4417

David S. Miller authored Dec 05, 2018

Nikolay Aleksandrov says:

====================
net: bridge: convert multicast to generic rhashtable

The current bridge multicast code uses a custom rhashtable
implementation which predates the generic rhashtable API. Patch 01
converts it to use the generic kernel rhashtable which simplifies the
code a lot and removes duplicated functionality. The convert also makes
hash_elasticity obsolete as the generic rhashtable already has such
checks and has a fixed elasticity of RHT_ELASTICITY (16 currently) so we
emit a warning whenever elasticity is set and return RHT_ELASTICITY when
read (patch 03). Patch 02 converts the multicast code to use non-bh RCU
flavor as it was mixing bh and non-bh. Since now we have the generic
rhashtable which autoshrinks we can be more liberal with the default
hash maximum so patch 04 increases it to 4096 and moves it to a define in
br_private.h.

v3: add non-rcu br_mdb_get variant and use it where we have
    multicast_lock, drop special hash_max handling and just set it where
    needed and use non-bh RCU consistently (patch 02, new)
v2: send the latest version of the set which handles when IGMP snooping
    is not defined, changes are in patch 01
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

932c4417

net: bridge: increase multicast's default maximum number of entries · d08c6bc0

Nikolay Aleksandrov authored Dec 05, 2018

bridge's default hash_max was 512 which is rather conservative, now that
we're using the generic rhashtable API which autoshrinks let's increase
it to 4096 and move it to a define in br_private.h.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d08c6bc0

net: bridge: mark hash_elasticity as obsolete · cf332bca

Nikolay Aleksandrov authored Dec 05, 2018

Now that the bridge multicast uses the generic rhashtable interface we
can drop the hash_elasticity option as that is already done for us and
it's hardcoded to a maximum of RHT_ELASTICITY (16 currently). Add a
warning about the obsolete option when the hash_elasticity is set.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cf332bca

net: bridge: multicast: use non-bh rcu flavor · 4329596c

Nikolay Aleksandrov authored Dec 05, 2018

The bridge multicast code has been using a mix of RCU and RCU-bh flavors
sometimes in questionable way. Since we've moved to rhashtable just use
non-bh RCU everywhere. In addition this simplifies freeing of objects
and allows us to remove some unnecessary callback functions.

v3: new patch
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4329596c

net: bridge: convert multicast to generic rhashtable · 19e3a9c9

Nikolay Aleksandrov authored Dec 05, 2018

The bridge multicast code currently uses a custom resizable hashtable
which predates the generic rhashtable interface. It has many
shortcomings compared and duplicates functionality that is presently
available via the generic rhashtable, so this patch removes the custom
rhashtable implementation in favor of the kernel's generic rhashtable.
The hash maximum is kept and the rhashtable's size is used to do a loose
check if it's reached in which case we revert to the old behaviour and
disable further bridge multicast processing. Also now we can support any
hash maximum, doesn't need to be a power of 2.

v3: add non-rcu br_mdb_get variant and use it where multicast_lock is
    held to avoid RCU splat, drop hash_max function and just set it
    directly

v2: handle when IGMP snooping is undefined, add br_mdb_init/uninit
    placeholders
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

19e3a9c9

Merge tag 'mlx5e-updates-2018-12-04' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · ba5dfaff

David S. Miller authored Dec 05, 2018

Saeed Mahameed says:

====================
mlx5e-updates-2018-12-04

This series includes updates to mlx5e netdevice driver

From Saeed, Remove trailing space of tx_pause ethtool stat
From Gal, Cleanup unused defines
From Aya, ethtool Support for configuring of RX hash fields
From Tariq, Improve ethtool private-flags code structure
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

ba5dfaff

Merge branch 'u32-to-linkmode-fixes' · 7127f2fe

David S. Miller authored Dec 05, 2018

Andrew Lunn says:

====================
u32 to linkmode fixes

This patchset fixes issues found in the last patchset which converted
the phydev advertise etc, from a u32 to a linux bitmap. Most of the
issues are the result of clearing bits which should not of been
cleared. To make the API clearer, the idea from Heiner Kallweit was
used, with _mod_ to indicate the function modifies just the bits it
needs to, or _to_ to clear all bits and just set bit that need to be
set.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

7127f2fe

net: phy: Fix ioctl handler when modifing MII_ADVERTISE · 9db299c7

Andrew Lunn authored Dec 05, 2018

When the MII_ADVERTISE register is modified by the IOCTL handler,
phydev->advertising needs recalculating. Use the _mod_ variant of
mii_adv_to_linkmode_adv_t so that bits outside of the advertise
registers are not cleared.

Fixes: c0ec3c27 ("net: phy: Convert u32 phydev->lp_advertising to linkmode")
Reported-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

9db299c7

net: mii: mii_lpa_mod_linkmode_lpa_t: Make use of linkmode_mod_bit helper · 6dbd0090

Andrew Lunn authored Dec 05, 2018

Replace the if else code structure with a call to the helper
linkmode_mod_bit.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

6dbd0090

net: mii: Add mii_lpa_mod_linkmode_lpa_t · d3351931

Andrew Lunn authored Dec 05, 2018

Add a _mod_ variant of mii_lpa_to_linkmode_lpa_t. Use this to fix the
genphy_read_status() where the 1G link partner features are getting
lost.

Fixes: c0ec3c27 ("net: phy: Convert u32 phydev->lp_advertising to linkmode")
Reported-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

d3351931

phy: marvell: Rename mii_lpa_to_linkmode_lpa_t · ab9cb729

Andrew Lunn authored Dec 05, 2018

Rename mii_lpa_to_linkmode_lpa_t to mii_lpa_mod_linkmode_lpa_t to
indicate it modifies the passed linkmode bitmap, without clearing any
other bits.

Also, ensure bit are clear which the lpa indicates should not be set.

Fixes: c0ec3c27 ("net: phy: Convert u32 phydev->lp_advertising to linkmode")
Suggested-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

ab9cb729

net: mii: Rename mii_stat1000_to_linkmode_lpa_t · 78a24df3

Andrew Lunn authored Dec 05, 2018

Rename mii_stat1000_to_linkmode_lpa_t to
mii_stat1000_mod_linkmode_lpa_t to indicate it modifies the passed
linkmode bitmap, without clearing any other bits.

Add a helper to set/clear bits in a linkmode.

Use this helper to ensure bit are clear which the stat1000 indicates
should not be set.

Fixes: c0ec3c27 ("net: phy: Convert u32 phydev->lp_advertising to linkmode")
Suggested-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

78a24df3

net: mii: Fix autoneg in mii_lpa_to_linkmode_lpa_t() · 5f15eed2

Andrew Lunn authored Dec 05, 2018

mii_adv_to_linkmode_adv_t() clears all bits before setting it needs to
set. This means the freshly set Autoneg gets cleared.

Change the order, and add comments about it clearing the old content
of the bitmap.

Fixes: c0ec3c27 ("net: phy: Convert u32 phydev->lp_advertising to linkmode")
Reported-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

5f15eed2

net/mlx5e: Improve ethtool private-flags code structure · 8ff57c18

Tariq Toukan authored Nov 19, 2018

Refactor the code of private-flags setter.
Replace consecutive calls to mlx5e_handle_pflag with a loop
that uses a preset set of parameters.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

8ff57c18

net/mlx5e: ethtool, Support user configuration for RX hash fields · 756c4160

Aya Levin authored Oct 23, 2018

Enable user configuration of RX hash fields that are used for traffic
spreading into RX queues. User can change built-in RSS (Receive Side
Scaling) profiles on the following traffic types: UDP4, UDP6, TCP4 and
TCP6.  This configuration effects both outer and inner headers.  Added
support for ethtool commands: ETHTOOL_SRXFH and ETHTOOL_GRXFH.

Command example respectively:
$ethtool -N eth1 rx-flow-hash tcp4 sdfn
$ethtool -n eth1 rx-flow-hash tcpp4
IP SA
IP DA
L4 bytes 0 & 1 [TCP/UDP src port]
L4 bytes 2 & 3 [TCP/UDP dst port]
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

756c4160

net/mlx5e: Move RSS params to a dedicated struct · bbeb53b8

Aya Levin authored Nov 06, 2018

Remove RSS params from params struct under channels, and introduce
a new struct with RSS configuration params under priv struct. There is
no functional change here.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

bbeb53b8

net/mlx5e: Refactor TIR configuration function · d930ac79

Aya Levin authored Oct 28, 2018

Refactor mlx5e_build_indir_tir_ctx_hash for better code re-use. TIR
stands for Transport Interface Receive, which is responsible for all
transport related operations on the receive side. Added a
static array with TIR default configuration values. This separates
configuration values from command setting, which is needed for
downstream patch.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

d930ac79

05 Dec, 2018 7 commits

net: documentation: build a directory structure for drivers · b255e500

Jakub Kicinski authored Dec 03, 2018

Documentation/networking/ is full of cryptically named files with
driver documentation.  This makes finding interesting information
at a glance really hard.  Move all those files into a directory
called device_drivers (since not all drivers are for device) and
fix up references.

RFC v0.1 -> RFC v1:
 - also add .txt suffix to the files which are missing it (Quentin)
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: David Ahern <dsahern@gmail.com>
Acked-by: Henrik Austad <henrik@austad.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

b255e500

tcp: reduce POLLOUT events caused by TCP_NOTSENT_LOWAT · a74f0fa0

Eric Dumazet authored Dec 04, 2018

TCP_NOTSENT_LOWAT socket option or sysctl was added in linux-3.12
as a step to enable bigger tcp sndbuf limits.

It works reasonably well, but the following happens :

Once the limit is reached, TCP stack generates
an [E]POLLOUT event for every incoming ACK packet.

This causes a high number of context switches.

This patch implements the strategy David Miller added
in sock_def_write_space() :

 - If TCP socket has a notsent_lowat constraint of X bytes,
   allow sendmsg() to fill up to X bytes, but send [E]POLLOUT
   only if number of notsent bytes is below X/2

This considerably reduces TCP_NOTSENT_LOWAT overhead,
while allowing to keep the pipe full.

Tested:
 100 ms RTT netem testbed between A and B, 100 concurrent TCP_STREAM

A:/# cat /proc/sys/net/ipv4/tcp_wmem
4096	262144	64000000
A:/# super_netperf 100 -H B -l 1000 -- -K bbr &

A:/# grep TCP /proc/net/sockstat
TCP: inuse 203 orphan 0 tw 19 alloc 414 mem 1364904 # This is about 54 MB of memory per flow :/

A:/# vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 256220672  13532 694976    0    0    10     0   28   14  0  1 99  0  0
 2  0      0 256320016  13532 698480    0    0   512     0 715901 5927  0 10 90  0  0
 0  0      0 256197232  13532 700992    0    0   735    13 771161 5849  0 11 89  0  0
 1  0      0 256233824  13532 703320    0    0   512    23 719650 6635  0 11 89  0  0
 2  0      0 256226880  13532 705780    0    0   642     4 775650 6009  0 12 88  0  0

A:/# echo 2097152 >/proc/sys/net/ipv4/tcp_notsent_lowat

A:/# grep TCP /proc/net/sockstat
TCP: inuse 203 orphan 0 tw 19 alloc 414 mem 86411 # 3.5 MB per flow

A:/# vmstat 5 5  # check that context switches have not inflated too much.
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0      0 260386512  13592 662148    0    0    10     0   17   14  0  1 99  0  0
 0  0      0 260519680  13592 604184    0    0   512    13 726843 12424  0 10 90  0  0
 1  1      0 260435424  13592 598360    0    0   512    25 764645 12925  0 10 90  0  0
 1  0      0 260855392  13592 578380    0    0   512     7 722943 13624  0 11 88  0  0
 1  0      0 260445008  13592 601176    0    0   614    34 772288 14317  0 10 90  0  0
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a74f0fa0

Merge branch 'act_tunnel_key-support-key-less-tunnels' · 4dc88ce6

David S. Miller authored Dec 04, 2018

Or Gerlitz says:

====================
net/sched: act_tunnel_key: support key-less tunnels

This short series from Adi Nissim allows to support key-less tunnels
by the tc tunnel key actions, which is needed for some GRE use-cases.

changes from V0:
 - addresses build warning spotted by kbuild, make sure to always init
   to zero the tunnel key
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

4dc88ce6

net/sched: act_tunnel_key: Don't dump dst port if it wasn't set · 1c25324c

Adi Nissim authored Dec 02, 2018

It's possible to set a tunnel without a destination port. However,
on dump(), a zero dst port is returned to user space even if it was not
set, fix that.

Note that so far it wasn't required, b/c key less tunnels were not
supported and the UDP tunnels do require destination port.
Signed-off-by: Adi Nissim <adin@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1c25324c

net/sched: act_tunnel_key: Allow key-less tunnels · 80ef0f22

Adi Nissim authored Dec 02, 2018

Allow setting a tunnel without a tunnel key. This is required for
tunneling protocols, such as GRE, that define the key as an optional
field.
Signed-off-by: Adi Nissim <adin@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

80ef0f22

qed: fix spelling mistake "Dispalying" -> "Displaying" · d1ecf8a6

Colin Ian King authored Dec 03, 2018

There is a spelling mistake in a DP_NOTICE message, fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d1ecf8a6

net/mlx5e: Move modify tirs hash functionality · 080d1b17

Aya Levin authored Oct 23, 2018

Move modify tirs hash functionality (mlx5e_modify_tirs_hash) from
en_ethtool.c to en_main.c. This allows future use of this fuctionality
from en_fs_ethtool.c, while keeping current convention: en_ethtool.c
doesn't have an API.  There is no functional change here.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

080d1b17