Commits · b945245297416a3c68ed12f2ada1c7162f5f73fd · nexedi / linux

25 May, 2018 40 commits

nfp: flower: add per repr private data for LAG offload · b9452452

John Hurley authored May 23, 2018

Add a bitmap to each flower repr to track its state if it is enslaved by a
bond. This LAG state may be different to the port state - for example, the
port may be up but LAG state may be down due to the selection in an
active/backup bond.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b9452452

nfp: flower: check for/turn on LAG support in firmware · 898bc7d6

John Hurley authored May 23, 2018

Check if the fw contains the _abi_flower_balance_sync_enable symbol. If it
does then write a 1 to this indicating that the driver is willing to
receive NIC to kernel LAG related control messages.

If the write is successful, update the list of extra features supported by
the fw and add a stub to accept LAG cmsgs.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

898bc7d6

nfp: nfpcore: add rtsym writing function · 1945ca7a

John Hurley authored May 23, 2018

Add an rtsym API function that combines the lookup of a symbol and the
writing of a value to it. Values can be written as unsigned 32 or 64 bits.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1945ca7a

nfp: add ndo_set_mac_address for representors · 24f132e2

John Hurley authored May 23, 2018

Adding a netdev to a bond requires that its mac address can be modified.
The default eth_mac_addr is sufficient to satisfy this requirement.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

24f132e2

hv_netvsc: fix bogus ifalias on network device · d97cde6a

Stephen Hemminger authored May 23, 2018

If the guest network adapter is not configured with DeviceNaming
enabled on the host, then the query for friendly name will return
success but with a zero length name. Which then leads to a garbage value
(stack contents) for ifalias.

Fix is simple, just don't set name if  host doesn't return it.

Fixes: 0fe554a4 ("hv_netvsc: propogate Hyper-V friendly name into interface alias")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d97cde6a

Merge branch 'net-Update-fib_table_lookup-tracepoints' · de0a8267

David S. Miller authored May 24, 2018

David Ahern says:

====================
net: Update fib_table_lookup tracepoints

Update the FIB lookup tracepoints to include ip proto and port fields
from the flow struct. In the process make the IPv4 tracepoint inline
with IPv6 which is much easier to use and follow the lookup and result.

Remove the tracepoint in fib_validate_source which does not provide
value above the fib_table_lookup which immediately follows it.

v2
- move CREATE_TRACE_POINTS for the v6 tracepoint to route.c to handle
  its need for an internal function to convert route type to error and
  handle IPv6 as a module or builtin. Reported by kbuild robot.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

de0a8267

net/ipv4: Remove tracepoint in fib_validate_source · c949cbbb

David Ahern authored May 23, 2018

Tracepoint does not add value and the call to fib_lookup follows
it which shows the same information and the fib lookup result.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c949cbbb

net/ipv6: Udate fib6_table_lookup tracepoint · 30d444d3

David Ahern authored May 23, 2018

Commit bb0ad198 ("ipv6: fib6_rules: support for match on sport, dport
and ip proto") added support for protocol and ports to FIB rules.
Update the FIB lookup tracepoint to dump the parameters.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

30d444d3

net/ipv4: Udate fib_table_lookup tracepoint · 9f323973

David Ahern authored May 23, 2018

Commit 4a2d73a4 ("ipv4: fib_rules: support match on sport, dport
and ip proto") added support for protocol and ports to FIB rules.
Update the FIB lookup tracepoint to dump the parameters.

In addition, make the IPv4 tracepoint similar to the IPv6 one where
the lookup parameters and result are dumped in 1 event. It is much
easier to use and understand the outcome of the lookup.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9f323973

net_sched: switch to rcu_work · aaa908ff

Cong Wang authored May 23, 2018

Commit 05f0fe6b ("RCU, workqueue: Implement rcu_work") introduces
new API's for dispatching work in a RCU callback. Now we can just
switch to the new API's for tc filters. This could get rid of a lot
of code.

Cc: Tejun Heo <tj@kernel.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

aaa908ff

Merge branch 'Mirroring-tests-involving-VLAN' · 1bb58d2d

David S. Miller authored May 24, 2018

Petr Machata says:

====================
Mirroring tests involving VLAN

This patchset tests mirror-to-gretap with various underlay
configurations involving VLAN netdevice in particular. Some of the tests
involve bridges as well, but tests aimed specifically at testing bridges
(i.e. FDB, STP) are not part of this patchset.

In patches #1-#6, the codebase is adapted to support the new tests.

In patch #7, a test for mirroring to VLAN is introduced.

Patches #8-#10 add three tests where VLAN is part of underlay path after
gretap encapsulation.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

1bb58d2d

selftests: forwarding: Test mirror-to-gre w/ UL 802.1d+VLAN · 181d95f8

Petr Machata authored May 24, 2018

Test for "tc action mirred egress mirror" that mirrors to GRE when the
underlay route points at an 802.1d bridge and packet egresses through a
VLAN device.

Besides testing basic connectivity, this also tests that the traffic is
properly tagged.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

181d95f8

selftests: forwarding: Test mirror-to-gre w/ UL VLAN · a08fb9f1

Petr Machata authored May 24, 2018

Test for "tc action mirred egress mirror" that mirrors to a gretap
netdevice whose underlay route points at a vlan device.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a08fb9f1

selftests: forwarding: Test mirror-to-gre w/ UL VLAN+802.1q · 0056042f

Petr Machata authored May 24, 2018

Test for "tc action mirred egress mirror" that mirrors to GRE when the
underlay route points at a vlan device on top of a bridge device with
vlan filtering (802.1q).
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0056042f

selftests: forwarding: Test mirror-to-vlan · 35388a6a

Petr Machata authored May 24, 2018

Test for "tc action mirred egress mirror" that mirrors to a vlan device.
- test_vlan() tests that the packets get mirrored
- test_tagged_vlan() tests that the mirrored packets have correct inner
  VLAN tag.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

35388a6a

selftests: forwarding: lib: Extract trap_{, un}install() · 87c0c046

Petr Machata authored May 24, 2018

A mirror-to-vlan test that's coming next needs to install the trap
unconditionally. Therefore extract from slow_path_trap_{,un}install()
a more generic functions trap_install() and trap_uninstall(), and covert
the former two to conditional wrappers around these.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

87c0c046

selftests: forwarding: mirror_gre_lib: Support VLAN · 1893150f

Petr Machata authored May 24, 2018

Add full_test_span_gre_dir_vlan_ips() and full_test_span_gre_dir_vlan()
to support mirror-to-gre tests that involve VLAN.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1893150f

selftests: forwarding: lib: Support VLAN devices · 0e7a504c

Petr Machata authored May 24, 2018

Add vlan_create() and vlan_destroy() to manage VLAN netdevices.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0e7a504c

selftests: forwarding: Add $h3's clsact to mirror_topo_lib.sh · 91bac7f9

Petr Machata authored May 24, 2018

Having a clsact qdisc on $h3 is useful in several tests, and will be
useful in more tests to come. Move the registration from all the tests
that need it into the topology file itself.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

91bac7f9

selftests: forwarding: mirror_gre_lib: Extract generic functions · d5ea2bfc

Petr Machata authored May 24, 2018

For non-GRE mirroring tests, a functions along the lines of
do_test_span_gre_dir_ips() and test_span_gre_dir_ips() are necessary,
but such that they don't assume tunnels are involved. Extract the code
from mirror_gre_lib.sh to mirror_lib.sh and convert to just use a given
device without assuming it's named "h3-$tundev". Convert the two
above-mentioned functions to wrappers that pass along the correct device
name.

Add test_span_dir() and fail_test_span_dir() to round up the API for use
by following patches.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d5ea2bfc

selftests: forwarding: Split mirror_gre_topo_lib.sh · 74ed089d

Petr Machata authored May 24, 2018

Move generic parts of mirror_gre_topo_lib.sh into a new file
mirror_topo_lib.sh. Reuse the functions in GRE topo, adding the tunnel
devices as necessary.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

74ed089d

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 90fed9c9

David S. Miller authored May 24, 2018

Alexei Starovoitov says:

====================
pull-request: bpf-next 2018-05-24

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Björn Töpel cleans up AF_XDP (removes rebind, explicit cache alignment from uapi, etc).

2) David Ahern adds mtu checks to bpf_ipv{4,6}_fib_lookup() helpers.

3) Jesper Dangaard Brouer adds bulking support to ndo_xdp_xmit.

4) Jiong Wang adds support for indirect and arithmetic shifts to NFP

5) Martin KaFai Lau cleans up BTF uapi and makes the btf_header extensible.

6) Mathieu Xhonneux adds an End.BPF action to seg6local with BPF helpers allowing
   to edit/grow/shrink a SRH and apply on a packet generic SRv6 actions.

7) Sandipan Das adds support for bpf2bpf function calls in ppc64 JIT.

8) Yonghong Song adds BPF_TASK_FD_QUERY command for introspection of tracing events.

9) other misc fixes from Gustavo A. R. Silva, Sirio Balmelli, John Fastabend, and Magnus Karlsson
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

90fed9c9

Merge branch 'ibmvnic-Failover-hardening' · 49a473f5

David S. Miller authored May 24, 2018

Thomas Falcon says:

====================
ibmvnic: Failover hardening

Introduce additional transport event hardening to handle
events during device reset. In the driver's current state,
if a transport event is received during device reset, it can
cause the device to become unresponsive as invalid operations
are processed as the backing device context changes. After
a transport event, the device expects a request to begin the
initialization process. If the driver is still processing
a previously queued device reset in this state, it is likely
to fail as firmware will reject any commands other than the
one to initialize the client driver's Command-Response Queue.

Instead of failing and becoming dormant, the driver will make
one more attempt to recover and continue operation. This is
achieved by setting a state flag, which if true will direct
the driver to clean up all allocated resources and perform
a hard reset in an attempt to bring the driver back to an
operational state.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

49a473f5

ibmvnic: Introduce hard reset recovery · 2770a798

Thomas Falcon authored May 23, 2018

Introduce a recovery hard reset to handle reset failure as a result of
change of device context following a transport event, such as a
backing device failover or partition migration. These operations reset
the device context to its initial state. If this occurs during a reset,
any initialization commands are likely to fail with an invalid state
error as backing device firmware requests reinitialization.

When this happens, make one more attempt by performing a hard reset,
which frees any resources currently allocated and performs device
initialization. If a transport event occurs during a device reset, a
flag is set which will trigger a new hard reset following the
completionof the current reset event.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2770a798

ibmvnic: Set resetting state at earliest possible point · 06e43d7f

Thomas Falcon authored May 23, 2018

Set device resetting state at the earliest possible point: as soon as a
reset is successfully scheduled. The reset state is toggled off when
all resets have been processed to completion.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

06e43d7f

ibmvnic: Create separate initialization routine for resets · 8a348450

Thomas Falcon authored May 23, 2018

Instead of having one initialization routine for all cases, create
a separate, simpler function for standard initialization, such as during
device probe. Use the original initialization function to handle
device reset scenarios. The goal of this patch is to avoid having
a single, cluttered init function to handle all possible
scenarios.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8a348450

ibmvnic: Handle error case when setting link state · ab5ec33b

Thomas Falcon authored May 23, 2018

If setting the link state is not successful, print a warning
with the resulting return code and return it to be handled
by the caller.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ab5ec33b

ibmvnic: Return error code if init interrupted by transport event · 17c87058

Thomas Falcon authored May 23, 2018

If device init is interrupted by a failover, set the init return
code so that it can be checked and handled appropriately by the
init routine.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

17c87058

ibmvnic: Check CRQ command return codes · 9c4eaabd

Thomas Falcon authored May 23, 2018

Check whether CRQ command is successful before awaiting a response
from the management partition. If the command was not successful, the
driver may hang waiting for a response that will never come.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9c4eaabd

ibmvnic: Introduce active CRQ state · 5153698e

Thomas Falcon authored May 23, 2018

Introduce an "active" state for a IBM vNIC Command-Response Queue. A CRQ
is considered active once it has initialized or linked with its partner by
sending an initialization request and getting a successful response back
from the management partition. Until this has happened, do not allow CRQ
commands to be sent other than the initialization request.

This change will avoid a protocol error in case of a device transport
event occurring during a initialization. When the driver receives a
transport event notification indicating that the backing hardware
has changed and needs reinitialization, any further commands other
than the initialization handshake with the VIOS management partition
will result in an invalid state error. Instead of sending a command
that will be returned with an error, print a warning and return an
error that will be handled by the caller.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5153698e

ibmvnic: Mark NAPI flag as disabled when released · c3f22415

Thomas Falcon authored May 23, 2018

Set adapter NAPI state as disabled if they are removed. This will allow
them to be enabled again if reallocated in case of a hard reset.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c3f22415

Merge branch 'gretap-mirroring-selftests' · 180f848b

David S. Miller authored May 24, 2018

Petr Machata says:

====================
selftests: forwarding: Additions to mirror-to-gretap tests

This patchset is for a handful of edge cases in mirror-to-gretap
scenarios: removal of mirrored-to netdevice (#1), removal of underlay
route for tunnel remote endpoint (#2) and cessation of mirroring upon
removal of flower mirroring rule (#3).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

180f848b

selftests: forwarding: Test removal of mirroring · a96d81a2

Petr Machata authored May 23, 2018

Test that when flower-based mirror action is removed, mirroring stops.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a96d81a2

selftests: forwarding: Test removal of underlay route · 77a8df38

Petr Machata authored May 23, 2018

When underlay route is removed, the mirrored traffic should not be
forwarded.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

77a8df38

selftests: forwarding: Test mirroring to deleted device · 6b45432d

Petr Machata authored May 23, 2018

Tests that the mirroring code catches up with deletion of a mirrored-to
device.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6b45432d

cxgb4: Check for kvzalloc allocation failure · d624613e

YueHaibing authored May 22, 2018

t4_prep_fw doesn't check for card_fw pointer before store the read data,
which could lead to a NULL pointer dereference if kvzalloc failed.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d624613e

Merge branch 'xdp_xmit-bulking' · 10f67868

Alexei Starovoitov authored May 24, 2018

Jesper Dangaard Brouer says:

====================
This patchset change ndo_xdp_xmit API to take a bulk of xdp frames.

When kernel is compiled with CONFIG_RETPOLINE, every indirect function
pointer (branch) call hurts performance. For XDP this have a huge
negative performance impact.

This patchset reduce the needed (indirect) calls to ndo_xdp_xmit, but
also prepares for further optimizations.  The DMA APIs use of indirect
function pointer calls is the primary source the regression.  It is
left for a followup patchset, to use bulking calls towards the DMA API
(via the scatter-gatter calls).

The other advantage of this API change is that drivers can easier
amortize the cost of any sync/locking scheme, over the bulk of
packets.  The assumption of the current API is that the driver
implemementing the NDO will also allocate a dedicated XDP TX queue for
every CPU in the system.  Which is not always possible or practical to
configure. E.g. ixgbe cannot load an XDP program on a machine with
more than 96 CPUs, due to limited hardware TX queues.  E.g. virtio_net
is hard to configure as it requires manually increasing the
queues. E.g. tun driver chooses to use a per XDP frame producer lock
modulo smp_processor_id over avail queues.

I'm considered adding 'flags' to ndo_xdp_xmit, but it's not part of
this patchset.  This will be a followup patchset, once we know if this
will be needed (e.g. for non-map xdp_redirect flush-flag, and if
AF_XDP chooses to use ndo_xdp_xmit for TX).

---
V5: Fixed up issues spotted by Daniel and John

V4: Splitout the patches from 4 to 8 patches.  I cannot split the
driver changes from the NDO change, but I've tried to isolated the NDO
change together with the driver change as much as possible.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

10f67868

samples/bpf: xdp_monitor use err code from tracepoint xdp:xdp_devmap_xmit · a570e48f

Jesper Dangaard Brouer authored May 24, 2018

Update xdp_monitor to use the recently added err code introduced
in tracepoint xdp:xdp_devmap_xmit, to show if the drop count is
caused by some driver general delivery problem.  Other kind of drops
will likely just be more normal TX space issues.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

a570e48f

xdp/trace: extend tracepoint in devmap with an err · e74de52e

Jesper Dangaard Brouer authored May 24, 2018

Extending tracepoint xdp:xdp_devmap_xmit in devmap with an err code
allow people to easier identify the reason behind the ndo_xdp_xmit
call to a given driver is failing.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

e74de52e

xdp: change ndo_xdp_xmit API to support bulking · 735fc405

Jesper Dangaard Brouer authored May 24, 2018

This patch change the API for ndo_xdp_xmit to support bulking
xdp_frames.

When kernel is compiled with CONFIG_RETPOLINE, XDP sees a huge slowdown.
Most of the slowdown is caused by DMA API indirect function calls, but
also the net_device->ndo_xdp_xmit() call.

Benchmarked patch with CONFIG_RETPOLINE, using xdp_redirect_map with
single flow/core test (CPU E5-1650 v4 @ 3.60GHz), showed
performance improved:
 for driver ixgbe: 6,042,682 pps -> 6,853,768 pps = +811,086 pps
 for driver i40e : 6,187,169 pps -> 6,724,519 pps = +537,350 pps

With frames avail as a bulk inside the driver ndo_xdp_xmit call,
further optimizations are possible, like bulk DMA-mapping for TX.

Testing without CONFIG_RETPOLINE show the same performance for
physical NIC drivers.

The virtual NIC driver tun sees a huge performance boost, as it can
avoid doing per frame producer locking, but instead amortize the
locking cost over the bulk.

V2: Fix compile errors reported by kbuild test robot <lkp@intel.com>
V4: Isolated ndo, driver changes and callers.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

735fc405