Commits · 291f445eab9b2b11c8dd05ae1a0943c328cb9b6b · Kirill Smelkov / linux

28 Mar, 2018 2 commits

net/mlx5e: Disable Striding RQ when PCI is slower than link · 291f445e

Tariq Toukan authored Feb 11, 2018

We turn the feature off for servers with PCI BW bounded
by a threshold (16G) and lower than MAX LINK BW.
This improves the effectiveness of CQE compression feature,
that is defaulted to ON for the same case.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

291f445e

net/mlx5e: Unify slow PCI heuristic · 0608d4db

Tariq Toukan authored Jan 17, 2018

Get the link/pci speed query and logic into a single function.
Unify the heuristics and use a single PCI threshold (16G) for all.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

0608d4db

27 Mar, 2018 27 commits

Merge branch 'sfc-filter-locking' · 5d22d47b

David S. Miller authored Mar 27, 2018

Edward Cree says:

====================
sfc: rework locking around filter management

The use of a spinlock to protect filter state combined with the need for a
 sleeping operation (MCDI) to apply that state to the NIC (on EF10) led to
 unfixable race conditions, around the handling of filter restoration after
 an MC reboot.
So, this patch series removes the requirement to be able to modify the SW
 filter table from atomic context, by using a workqueue to request
 asynchronous filter operations (which are needed for ARFS).  Then, the
 filter table locks are changed to mutexes, replacing the dance of spinlocks
 and 'busy' flags.  Also, a mutex is added to protect the RSS context state,
 since otherwise a similar race is possible around restoring that after an
 MC reboot.  While we're at it, fix a couple of other related bugs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

5d22d47b

sfc: fix flow type handling for RSS filters · a8e8fbeb

Edward Cree authored Mar 27, 2018

The FLOW_RSS flag was causing us to insert UDP filters when TCP was wanted.

Fixes: 42356d9a ("sfc: support RSS spreading of ethtool ntuple filters")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a8e8fbeb

sfc: protect list of RSS contexts under a mutex · e0a65e3c

Edward Cree authored Mar 27, 2018

Otherwise races are possible between ethtool ops and
 efx_ef10_rx_restore_rss_contexts().
Also, don't try to perform the restore on every reset, only after an MC
 reboot, otherwise we'll leak RSS contexts on the NIC.

Fixes: 42356d9a ("sfc: support RSS spreading of ethtool ntuple filters")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e0a65e3c

sfc: return a better error if filter insertion collides with MC reboot · 31b84295

Edward Cree authored Mar 27, 2018

If some other operation gets the MCDI lock ahead of us and performs an MC
reboot, then our attempt to insert the filter will fail with EINVAL,
because the destination VI (spec->dmaq_id, MC_CMD_FILTER_OP_IN_RX_QUEUE) does
not exist. But the caller's request (which might e.g. be an ethtool ntuple
request from userland) isn't invalid, it just got unlucky; so return EAGAIN.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

31b84295

sfc: use a semaphore to lock farch filters too · fc7a6c28

Edward Cree authored Mar 27, 2018

With this change, the spinlock efx->filter_lock is no longer used and is
 thus removed.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fc7a6c28

sfc: give ef10 its own rwsem in the filter table instead of filter_lock · c2bebe37

Edward Cree authored Mar 27, 2018

efx->filter_lock remains in place for use on farch, but EF10 now ignores it.
EFX_EF10_FILTER_FLAG_BUSY is no longer needed, hence it is removed.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c2bebe37

sfc: replace asynchronous filter operations · 3af0f342

Edward Cree authored Mar 27, 2018

Instead of having an efx->type->filter_rfs_insert() method, just use
 workitems with a worker function that calls efx->type->filter_insert().
The only user of this is efx_filter_rfs(), which now queues a call to
 efx_filter_rfs_work().
Similarly, efx_filter_rfs_expire() is now a worker function called on a
 new channel->filter_work work_struct, so the method
 efx->type->filter_rfs_expire_one() is no longer called in atomic context.
 We also add a new mutex efx->rps_mutex to protect the RPS state (efx->
 rps_expire_channel, efx->rps_expire_index, and channel->rps_flow_id) so
 that the taking of efx->filter_lock can be moved to
 efx->type->filter_rfs_expire_one().
Thus, all filter table functions are now called in a sleepable context,
 allowing them to use sleeping locks in a future patch.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3af0f342

Merge branch 'pernet-all-async' · c709002c

David S. Miller authored Mar 27, 2018

Kirill Tkhai says:

====================
Make pernet_operations always read locked

All the pernet_operations are converted, and the last one
is in this patchset (nfsd_net_ops acked by J. Bruce Fields).
So, it's the time to kill pernet_operations::async field,
and make setup_net() and cleanup_net() always require
the rwsem only read locked.

All further pernet_operations have to be developed to fit
this rule. Some of previous patches added a comment to
struct pernet_operations about that.

Also, this patchset renames net_sem to pernet_ops_rwsem
to make the target area of the rwsem is more clear visible,
and adds more comments.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

c709002c

net: Add more comments · 8518e9bb

Kirill Tkhai authored Mar 27, 2018

This adds comments to different places to improve
readability.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8518e9bb

net: Rename net_sem to pernet_ops_rwsem · 4420bf21

Kirill Tkhai authored Mar 27, 2018

net_sem is some undefined area name, so it will be better
to make the area more defined.

Rename it to pernet_ops_rwsem for better readability and
better intelligibility.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4420bf21

net: Drop pernet_operations::async · 2f635cee

Kirill Tkhai authored Mar 27, 2018

Synchronous pernet_operations are not allowed anymore.
All are asynchronous. So, drop the structure member.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2f635cee

net: Reflect all pernet_operations are converted · 094374e5

Kirill Tkhai authored Mar 27, 2018

All pernet_operations are reviewed and converted, hooray!
Reflect this in core code: setup_net() and cleanup_net()
will take down_read() always.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

094374e5

net: Convert nfsd_net_ops · 67441c24

Kirill Tkhai authored Mar 27, 2018

These pernet_operations look similar to rpcsec_gss_net_ops,
they just create and destroy another caches. So, they also
can be async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

67441c24

net: mvpp2: Use relaxed I/O in data path · cdcfeb0f

Yan Markman authored Mar 27, 2018

Use relaxed I/O on the hot path. This achieves significant performance
improvements. On a 10G link, this makes a basic iperf TCP test go from
an average of 4.5 Gbits/sec to about 9.40 Gbits/sec.
Signed-off-by: Yan Markman <ymarkman@marvell.com>
[Maxime: Commit message, cosmetic changes]
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cdcfeb0f

Merge tag 'mlx5-updates-2018-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 5f75a186

David S. Miller authored Mar 27, 2018

Saeed Mahameed says:

====================
mlx5-updates-2018-03-22 (Misc updates)

This series includes misc updates for mlx5 core and netdev dirver,

Highlights:

From Inbar, three patches to add support for PFC stall prevention
statistics and enable/disable through new ethtool tunable, as requested
from previous submission.

From Moshe, four patches, added more drop counters:
	- drop counter for netdev steering miss
	- drop counter for when VF logical link is down
        - drop counter for when netdev logical link is down.

From Or, three patches to support vlan push/pop offload via tc HW action,
for newer HW (Connectx-5 and onward) via HW steering flow actions rather
than the emulated path for the older HW brands.

And five more misc small trivial patches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

5f75a186

liquidio: Removed duplicate Tx queue status check · 4171ec06

Intiyaz Basha authored Mar 26, 2018

Napi is checking Tx queue status and waking the Tx queue if required.
Same operation is being done while freeing every Tx buffer.
So removed the duplicate operation of checking Tx queue status from the Tx
buffer free functions.
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

4171ec06

ipv6: addrconf: Use normal debugging style · e32ac250

Joe Perches authored Mar 26, 2018

Remove local ADBG macro and use netdev_dbg/pr_debug

Miscellanea:

o Remove unnecessary debug message after allocation failure as there
  already is a dump_stack() on the failure paths
o Leave the allocation failure message on snmp6_alloc_dev as there
  is one code path that does not do a dump_stack()
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e32ac250

tc-testing: Correct compound statements for namespace execution · cd464197

Lucas Bates authored Mar 26, 2018

If tdc is executing test cases inside a namespace, only the
first command in a compound statement will be executed inside
the namespace by tdc. As a result, the subsequent commands
are not executed inside the namespace and the test will fail.

Example:

for i in {x..y}; do args="foo"; done && tc actions add $args

The namespace execution feature will prepend 'ip netns exec'
to the command:

ip netns exec tcut for i in {x..y}; do args="foo"; done && \
  tc actions add $args

So the actual tc command is not parsed by the shell as being
part of the namespace execution.

Enclosing these compound statements inside a bash invocation
with proper escape characters resolves the problem by creating
a subshell inside the namespace.
Signed-off-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cd464197

tipc: tipc_node_create() can be static · e1a22d13

Wei Yongjun authored Mar 26, 2018

Fixes the following sparse warning:

net/tipc/node.c:336:18: warning:
 symbol 'tipc_node_create' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1a22d13

tipc: fix error handling in tipc_udp_enable() · c76f2481

Wei Yongjun authored Mar 26, 2018

Release alloced resource before return from the error handling
case in tipc_udp_enable(), otherwise will cause memory leak.

Fixes: 52dfae5c ("tipc: obtain node identity from interface by default")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c76f2481

net: aquantia: Make function hw_atl_utils_mpi_set_speed() static · 6a91ded3

Wei Yongjun authored Mar 26, 2018

Fixes the following sparse warning:

drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c:508:5: warning:
 symbol 'hw_atl_utils_mpi_set_speed' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6a91ded3

Merge branch 'net-mvpp2-Remove-unnecessary-dynamic-allocs' · 776d7c5f

David S. Miller authored Mar 27, 2018

Maxime Chevallier says:

====================
net: mvpp2: Remove unnecessary dynamic allocs

Some utility functions in mvpp2 make use of dynamic alloc to exchange temporary
objects representing Parser Entries (which are generic filtering entries in the
PPv2 controller).

These objects are small (44 bytes each), we can use the stack to exchange them.

Some previous discussion on this topic showed that the mvpp2_prs_hw_read, which
initializes a struct mvpp2_prs_entry based on one of its fields, can easily lead
to erroneous code if we don't zero-out the struct beforehand :

https://lkml.org/lkml/2018/3/21/739

To fix this, I propose to rename mvpp2_prs_hw_read into mvpp2_prs_init_from_hw,
make it zero-out the struct and take the index as a parameter. That's what's
done in the first patch of the series.

The second patch is the V3 of
("net: mvpp2: Don't use dynamic allocs for local variables"), making use of
mvpp2_prs_init_from_hw and taking previous comments into account.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

776d7c5f

net: mvpp2: Don't use dynamic allocs for local variables · 0c6d9b44

Maxime Chevallier authored Mar 26, 2018

Some helper functions that search for given entries in the TCAM filter
on PPv2 controller make use of dynamically alloced temporary variables,
allocated with GFP_KERNEL. These functions can be called in atomic
context, and dynamic alloc is not really needed in these cases anyways.

This commit gets rid of dynamic allocs and use stack allocation in the
following functions, and where they're used :
 - mvpp2_prs_flow_find
 - mvpp2_prs_vlan_find
 - mvpp2_prs_double_vlan_find
 - mvpp2_prs_mac_da_range_find

For all these functions, instead of returning an temporary object
representing the TCAM entry, we simply return the TCAM id that matches
the requested entry.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0c6d9b44

net: mvpp2: Make mvpp2_prs_hw_read a parser entry init function · 47e0e14e

Maxime Chevallier authored Mar 26, 2018

The mvpp2_prs_hw_read function uses the 'index' field of the struct
mvpp2_prs_entry to initialize the rest of the fields. This makes it
unclear from a caller's perspective, who needs to manipulate a struct
that is not entirely initialized.

This commit makes it an init function for prs_entry, by passing it the
index as a parameter. The function now zeroes the entry, and sets the
index field before doing all other init from HW.

The function is renamed 'mvpp2_prs_init_from_hw' to make that clear.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

47e0e14e

net/ncsi: check for null return from call to nla_nest_start · 8daf1a2d

Colin Ian King authored Mar 26, 2018

The call to nla_nest_start calls nla_put which can lead to a NULL
return so it's possible for attr to become NULL and we can potentially
get a NULL pointer dereference on attr.  Fix this by checking for
a NULL return.

Detected by CoverityScan, CID#1466125 ("Dereference null return")

Fixes: 955dc68c ("net/ncsi: Add generic netlink family")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8daf1a2d

sctp: remove unnecessary asoc in sctp_has_association · 53066538

Xin Long authored Mar 26, 2018

After Commit dae399d7 ("sctp: hold transport instead of assoc
when lookup assoc in rx path"), it put transport instead of asoc
in sctp_has_association. Variable 'asoc' is not used any more.

So this patch is to remove it, while at it,  it also changes the
return type of sctp_has_association to bool, and does the same
for it's caller sctp_endpoint_is_peeled_off.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

53066538

Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 13d5a30a

David S. Miller authored Mar 27, 2018

Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2018-03-26

This series contains updates to i40e only.

Jake provides several patches which remove the need for cmpxchg64(),
starting with moving I40E_FLAG_[UDP]_FILTER_SYNC from pf->flags to pf->state
since they are modified during run time possibly when the RTNL lock is not
held so they should be a state bits and not flags.  Moved additional
"flags" which should be state fields, into pf->state.  Ensure we hold
the RTNL lock for the entire sequence of preparing for reset and when
resuming, which will protect the flags related to interrupt scheme under
RTNL lock so that their modification is properly threaded.  Finally,
cleanup the use of cmpxchg64() since it is no longer needed.  Cleaned up
the holes in the feature flags created my moving some flags to the state
field.

Björn Töpel adds XDP_REDIRECT support as well as tweaking the page
counting for XDP_REDIRECT so that it will function properly.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

13d5a30a

26 Mar, 2018 11 commits

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 34fd03b9

David S. Miller authored Mar 26, 2018

Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2018-03-26

This patch series adds the ice driver, which will support the Intel(R)
E800 Series of network devices.

This is the first phase in the release of this driver where we implement
basic transmit and receive. The idea behind the multi-phase release is to
aid in code review as well as testing. Subsequent phases will implement
advanced features (like SR-IOV, tunnelling, flow director, QoS, etc.) that
build upon the previous phase(s). Each phase will be submitted as a patch
series.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

34fd03b9

i40e: add support for XDP_REDIRECT · d9314c47

Björn Töpel authored Mar 22, 2018

The driver now acts upon the XDP_REDIRECT return action. Two new ndos
are implemented, ndo_xdp_xmit and ndo_xdp_flush.

XDP_REDIRECT action enables XDP program to redirect frames to other
netdevs.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

d9314c47

i40e: tweak page counting for XDP_REDIRECT · 8ce29c67

Björn Töpel authored Mar 22, 2018

This commit tweaks the page counting for XDP_REDIRECT to function
properly. XDP_REDIRECT support will be added in a future commit.

The current page counting scheme assumes that the reference count
cannot decrease until the received frame is sent to the upper layers
of the networking stack. This assumption does not hold for the
XDP_REDIRECT action, since a page (pointed out by xdp_buff) can have
its reference count decreased via the xdp_do_redirect call.

To work around that, we now start off by a large page count and then
don't allow a refcount less than two.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

8ce29c67

i40e: re-number feature flags to remove gaps · 8f769dd1

Jacob Keller authored Mar 16, 2018

Remove the gaps created by the recent refactor of various feature flags
that have moved to the state field. Use only a u32 now that we have
fewer than 32 flags in the field.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

8f769dd1

i40e: stop using cmpxchg flow in i40e_set_priv_flags() · 886ff146

Jacob Keller authored Mar 16, 2018

Now that the only places which modify flags are either (a) during
initialization prior to creating a netdevice, or (b) while holding the
rtnl lock, we no longer need the cmpxchg64 call in i40e_set_priv_flags.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

886ff146

i40e: hold the RTNL lock while changing interrupt schemes · f0ee70a0

Jacob Keller authored Mar 16, 2018

When we suspend and resume, we need to clear and re-enable the interrupt
scheme. This was previously not done while holding the RTNL lock, which
could be problematic, because we are actually destroying and re-creating
queues.

Hold the RTNL lock for the entire sequence of preparing for reset, and
when resuming. This additionally protects the flags related to interrupt
scheme under RTNL lock so that their modification is properly threaded.

This is part of a larger effort to remove the need for cmpxchg64 in
i40e_set_priv_flags().
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

f0ee70a0

i40e: move client flags into state bits · 5f76a704

Jacob Keller authored Mar 16, 2018

The iWarp client flags are all potentially changed when the RTNL lock is
not held, so they should not be part of the pf->flags variable. Instead,
move them into the state field so that we can use atomic bit operations.

This is part of a larger effort to remove cmpxchg64 in
i40e_set_priv_flags()
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

5f76a704

i40e: move I40E_FLAG_TEMP_LINK_POLLING to state field · 0605c45c

Jacob Keller authored Mar 16, 2018

This flag is modified outside of the RTNL lock and thus should not be
part of the pf->flags variable.

Use a state bit instead, so that we can use atomic bit operations.

This is part of a larger effort to remove cmpxchg64 in
i40e_set_priv_flags()
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

0605c45c

net/mlx5e: Add VLAN offload features to hw_enc_features · 71186172

Aviv Heller authored Aug 17, 2017

We support outer VLAN offload in driver and HW regardless of whether
an encapsulation is present in the next headers.

Exposing this in hw_enc_features will allow us to offload outer VLANs
in cases where encapsulation protocols like VXLAN and IPsec are used.
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

71186172

net/mlx5e: Add a helper macro in set features ndo · be0f780b

Gal Pressman authored Jan 11, 2018

Add a new macro to prevent copy-pasting the same code for each new
feature.
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

be0f780b

net/mlx5e: Make choose LRO timeout function static · 707129dc

Gal Pressman authored Jan 31, 2018

The function is used in en_main.c only, we can make it static and remove
its declaration from en.h
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

707129dc