Commits · f1741730dd18828fe3ea5fa91c22f41cf001c625 · nexedi / linux

29 Mar, 2019 23 commits

net: Add fib_nh_common and update fib_nh and fib6_nh · f1741730

David Ahern authored Mar 27, 2019

Add fib_nh_common struct with common nexthop attributes. Convert
fib_nh and fib6_nh to use it. Use macros to move existing
fib_nh_* references to the new nh_common.nhc_*.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f1741730

ipv6: Rename fib6_nh entries · ad1601ae

David Ahern authored Mar 27, 2019

Rename fib6_nh entries that will be moved to a fib_nh_common struct.
Specifically, the device, gateway, flags, and lwtstate are common
with all nexthop definitions. In some places new temporary variables
are declared or local variables renamed to maintain line lengths.

Rename only; no functional change intended.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ad1601ae

ipv4: Rename fib_nh entries · b75ed8b1

David Ahern authored Mar 27, 2019

Rename fib_nh entries that will be moved to a fib_nh_common struct.
Specifically, the device, oif, gateway, flags, scope, lwtstate,
nh_weight and nh_upper_bound are common with all nexthop definitions.
In the process shorten fib_nh_lwtstate to fib_nh_lws to avoid really
long lines.

Rename only; no functional change intended.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b75ed8b1

ipv6: Change rt6_add_nexthop and rt6_nexthop_info to take fib6_nh · 572bf4dd

David Ahern authored Mar 27, 2019

rt6_add_nexthop and rt6_nexthop_info only need the fib6_info for the
gateway flag and the nexthop weight, and the presence of a gateway is now
per-nexthop. Update the signatures to take a fib6_nh and nexthop weight
and better align with the ipv4 versions.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

572bf4dd

ipv6: Refactor fib6_ignore_linkdown · 6d3d07b4

David Ahern authored Mar 27, 2019

fib6_ignore_linkdown takes a fib6_info but only looks at the net_device
and its IPv6 config. Change it to take a net_device over a fib6_info as
its input argument.

In addition, move it to a header file to make the check inline and usable
later with IPv4 code without going through the ipv6 stub, and rename to
ip6_ignore_linkdown since it is only checking the setting based on the
ipv6 struct on a device.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6d3d07b4

ipv6: Move gateway checks to a fib6_nh setting · 2b2450ca

David Ahern authored Mar 27, 2019

The gateway setting is not per fib6_info entry but per-fib6_nh. Add a new
fib_nh_has_gw flag to fib6_nh and convert references to RTF_GATEWAY to
the new flag. For IPv6 address the flag is cheaper than checking that
nh_gw is non-0 like IPv4 does.

While this increases fib6_nh by 8-bytes, the effective allocation size of
a fib6_info is unchanged. The 8 bytes is recovered later with a
fib_nh_common change.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2b2450ca

ipv6: Create cleanup helper for fib6_nh · dac7d0f2

David Ahern authored Mar 27, 2019

Move the fib6_nh cleanup code to a new helper, fib6_nh_release.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dac7d0f2

ipv6: Create init helper for fib6_nh · 83c44251

David Ahern authored Mar 27, 2019

Similar to IPv4, consolidate the fib6_nh initialization into a helper.
As a new standalone function, add a cleanup path to put lwtstate on
error.

To avoid modifying fib6_config flags, move the reject check to a helper
that is invoked once by fib6_nh_init to reset the device and then
again in ip6_route_info_create to set the fib6_flags.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

83c44251

ipv4: Create cleanup helper for fib_nh · faa041a4

David Ahern authored Mar 27, 2019

Move the fib_nh cleanup code from free_fib_info_rcu into a new helper,
fib_nh_release. Move classid accounting into fib_nh_release which is
called per fib_nh to make accounting symmetrical with fib_nh_init.
Export the helper to allow for use with nexthop objects in the
future.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

faa041a4

ipv4: Create init helper for fib_nh · e4516ef6

David Ahern authored Mar 27, 2019

Consolidate the fib_nh initialization which is duplicated between
fib_create_info for single path and fib_get_nhs for multipath.
Export the helper to allow for use with nexthop objects in the
future.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e4516ef6

ipv4: Move IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN to helper · 331c7a40

David Ahern authored Mar 27, 2019

in_dev lookup followed by IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN check
is called in several places, some with the rcu lock and others with the
rtnl held.

Move the check to a helper similar to what IPv6 has. Since the helper
can be invoked from either context use rcu_dereference_rtnl to
dereference ip_ptr.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

331c7a40

ipv4: Define fib_get_nhs when CONFIG_IP_ROUTE_MULTIPATH is disabled · 8373c6c8

David Ahern authored Mar 27, 2019

Define fib_get_nhs to return EINVAL when CONFIG_IP_ROUTE_MULTIPATH is
not enabled and remove the ifdef check for CONFIG_IP_ROUTE_MULTIPATH
in fib_create_info.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8373c6c8

Merge branch 'selftests-forwarding-Add-new-test-cases' · 113e59d0

David S. Miller authored Mar 28, 2019

Ido Schimmel says:

====================
selftests: forwarding: Add new test cases

This patchset mainly adds new forwarding test cases and performs small
changes in existing infrastructure.

Patches #1-#3 add new test cases for multicast RPF check, PCP and VLAN
matching using flower and tc VLAN modify action.

The rest of the patches are from Petr who says:

In patches #4 and #5, devlink_lib.sh is fixed to first not cause double
inclusion of lib.sh, and then to deduce the device name in a simpler way.

In patch #6, helpers for dealing with shared buffer configuration are
added to devlink_lib.sh.

In patch #7, MC-awareness test is fixed to configure shared buffers
explicitly.

In patch #8, several helpers are extracted from the MC-awareness test
and put into a new mlxsw-specific library, qos_lib.sh.

In patch #9, a new test is added which checks configuration of
strictly-prioritized streams.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

113e59d0

selftests: mlxsw: Add a new test for strict priority · 30905dc6

Petr Machata authored Mar 28, 2019

Test that when strict priority is configured on a system, the
higher-priority traffic does actually win all the available bandwidth.
The test uses a similar approach to qos_mc_aware.sh to run and account
the traffic.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

30905dc6

selftests: mlxsw: Add qos_lib.sh · 573363a6

Petr Machata authored Mar 28, 2019

Extract reusable code from qos_mc_aware.sh and put into a new library.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

573363a6

selftests: mlxsw: qos_mc_aware: Configure shared buffers · 5dde21b3

Petr Machata authored Mar 28, 2019

This test runs two streams of traffic from two independent ports to
create congestion on one egress port. It is necessary to configure the
shared buffer thresholds correctly, to make sure that there is traffic
from both streams in the shared buffer. Only then can the test actually
test prioritization among these streams.

Without this configuration, it is possible, that one of the streams
takes all of port-pool quota, and the other stream is not even admitted,
thus invalidating the result.

On Spectrum-1, this is not a problem, because MC traffic uses a separate
pool. But for Spectrum-2, MC and UC share the same pool, and the correct
configuration is important.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5dde21b3

selftests: forwarding: devlink_lib: Add shared buffer helpers · d04cc726

Petr Machata authored Mar 28, 2019

Add helpers to obtain, set, and restore a pool size, and a port-pool and
tc-pool threshold.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d04cc726

selftests: forwarding: devlink_lib: Simplify deduction of DEVLINK_DEV · 8e46aee6

Petr Machata authored Mar 28, 2019

Use devlink -j and jq for more accurate querying. Use cut -f-2 instead
of rev-cut-rev combo.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8e46aee6

selftests: forwarding: devlink_lib: Avoid double sourcing of lib.sh · 2cca8751

Petr Machata authored Mar 28, 2019

Don't source lib.sh twice and make the script work with ifnames passed
on the command line.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2cca8751

selftests: forwarding: Test action VLAN modify · 2fcbc0b1

Danielle Ratson authored Mar 28, 2019

Construct a basic topology consisting of two hosts connected using a
VLAN-aware bridge. Put each port in a different VLAN and test that ping
fails.

Add ingress and egress filters with a VLAN modify action and test that
ping passes.
Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2fcbc0b1

selftests: forwarding: Add PCP match and VLAN match tests · 0637e1f8

Amit Cohen authored Mar 28, 2019

Send packets with VLAN and PCP set and check that TC flower filters can
match on these keys.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0637e1f8

selftests: forwarding: Add reverse path forwarding (RPF) test cases · ca059af8

Ido Schimmel authored Mar 28, 2019

In case a packet is routed using a multicast route whose specified
ingress interface does not match the interface from which the packet was
received, the packet is dropped.

Add IPv4 and IPv6 test cases for above mentioned scenario.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ca059af8

net: mvneta: Add 2500BaseT support · eda3d1b0

Maxime Chevallier authored Mar 27, 2019

Some PHYs will use the 2500BaseX PHY_INTERFACE_MODE when being linked
with a partner using 2.5GBaseT.

Since we can't autonegotiate this speed between the MAC and the PHY, we
need to have the proper comphy support enabled, to make sure we can
safely advertise 2.5G and 1G in BaseT and be able to switch between both
corresponding PHY interface modes. This is now possible since comphy
support was added to this driver.

This commit adds the 2500BaseT mode to the list of supported modes when
using 2500BaseX, and was tested on a setup with an Armada385 and a
88E2010 PHY, both with and without the comphy node in the DT.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

eda3d1b0

28 Mar, 2019 17 commits

openvswitch: Add timeout support to ct action · 06bd2bdf

Yi-Hung Wei authored Mar 26, 2019

Add support for fine-grain timeout support to conntrack action.
The new OVS_CT_ATTR_TIMEOUT attribute of the conntrack action
specifies a timeout to be associated with this connection.
If no timeout is specified, it acts as is, that is the default
timeout for the connection will be automatically applied.

Example usage:
$ nfct timeout add timeout_1 inet tcp syn_sent 100 established 200
$ ovs-ofctl add-flow br0 in_port=1,ip,tcp,action=ct(commit,timeout=timeout_1)

CC: Pravin Shelar <pshelar@ovn.org>
CC: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

06bd2bdf

netfilter: Export nf_ct_{set,destroy}_timeout() · 717700d1

Yi-Hung Wei authored Mar 26, 2019

This patch exports nf_ct_set_timeout() and nf_ct_destroy_timeout().
The two functions are derived from xt_ct_destroy_timeout() and
xt_ct_set_timeout() in xt_CT.c, and moved to nf_conntrack_timeout.c
without any functional change.
It would be useful for other users (i.e. OVS) that utilizes the
finer-grain conntrack timeout feature.

CC: Pablo Neira Ayuso <pablo@netfilter.org>
CC: Pravin Shelar <pshelar@ovn.org>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

717700d1

Merge branch 's390-next' · c63d11ba

David S. Miller authored Mar 28, 2019

Julian Wiedmann says:

====================
s390/qeth: updates 2019-03-28

please apply the following patchset to net-next. This reworks the control
IO code in qeth so that we no longer need to poll for cmd completion,
and refactors the IDX setup code to also use this improved IO path.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

c63d11ba

s390/qeth: send IDX cmds via qeth_send_control_data() · 2e873d10

Julian Wiedmann authored Mar 28, 2019

This converts the IDX code to use qeth_send_control_data(), replacing
a bunch of duplicated IO code and unbounded waits. It also allows the
IDX sequence to benefit from the improved timeout & notify
infrastructure, so that we can eliminate the DOWN -> ACTIVATING -> UP
transition in the channel state machine.

The patch looks rather big, but most of it is a straight-forward
conversion of the old IDX cmd setup & callbacks to the new model.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2e873d10

s390/qeth: use callback to finalize cmd · 48ce6f89

Julian Wiedmann authored Mar 28, 2019

To avoid concurrency issues, some parts of the cmd setup are delayed
until qeth_send_control_data() holds the IO channel's irq_pending
"lock". Rather than hard-coding those setup steps for each cmd type,
have the cmd provide a callback. This will make it easier to also issue
IDX commands via qeth_send_control_data().
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

48ce6f89

s390/qeth: let qeth_notify_reply() set the notify reason · 61e04465

Julian Wiedmann authored Mar 28, 2019

As trivial cleanup before adding more users to qeth_notify_reply(),
move the setup of reply->rc from the caller into the helper.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

61e04465

s390/qeth: clarify default cmd callback · 988a747d

Julian Wiedmann authored Mar 28, 2019

Current code makes it look like qeth_send_control_data_cb() is some
sort of default callback for all cmds. But in practice, it is only used
for half of the cmd buffers we issue.
Reduce the confusion by only setting this callback for cmds that
actually want it, and while at it give the callback a name that matches
the established naming scheme.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

988a747d

s390/qeth: don't poll for cmd IO completion · 782e4a79

Julian Wiedmann authored Mar 28, 2019

All callers are running in process context now, so we can safely sleep
in qeth_send_control_data() while waiting for a cmd to complete.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

782e4a79

s390/qeth: convert IP table spinlock to mutex · df2a2a52

Julian Wiedmann authored Mar 28, 2019

All users of the lock are running in process context now.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

df2a2a52

s390/qeth: defer IPv6 address notifier events · 7686e4b6

Julian Wiedmann authored Mar 28, 2019

The inet6addr_chain is atomic. So instead of starting the cmd IO for
SETIP / DELIP straight from the notifier callback, run it from a
workqueue. This is the last step towards removal of cmd IO completion
polling.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7686e4b6

s390/qeth: add wrapper for IP table access · 05a17851

Julian Wiedmann authored Mar 28, 2019

Extract a little helper, so that high-level callers can manipulate the
IP table without worrying about the locking. This will make it easier
to convert the code to a different locking primitive later on.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

05a17851

s390/qeth: remove locking for RX modeset cache · 5c0aebc6

Julian Wiedmann authored Mar 28, 2019

The L2 and L3 .ndo_set_rx_mode callbacks maintain an address cache
to decide which addresses have changed since the last modeset.

When the card is set offline, qeth_l?_stop_card() drains this cache.
This happens only after 1) the net_device has been detached, and
2) any pending RX modeset has completed. Consequently we can access the
cache lock-free.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5c0aebc6

s390/qeth: defer RX modesetting · d0c74825

Julian Wiedmann authored Mar 28, 2019

.ndo_set_rx_mode gets called in process context, but while holding the
addr_list spinlock. Which means we currently can't sleep while
re-programming the HW, and need to poll for IO completion. That's bad,
in particular since receiving the cmd response can fail silently and
we're then polling until the timeout hits.

As a first step towards eliminating the IO completion polling, run the
RX modeset from a work element and only take the addr_list lock while
updating the RX mode address cache.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d0c74825

Merge branch 'net-call-for-phys_port_name-into-devlink-directly-if-possible' · 1571e2fd

David S. Miller authored Mar 28, 2019

Jiri Pirko says:

===================
net: call for phys_port_name into devlink directly if possible

phys_port_name may be assembled by a helper in devlink. It is currently
the case only for mlxsw driver. Benefit from the get_devlink_port ndo
and call into devlink directly from dev_get_phys_port_name(). That saves
the trip to the driver, simplifies the code and makes it similar to
recently introduced ethtool-devlink compat helpers.

Move bnxt, partly nfp and dsa to let devlink core generate the name too.
===================
Signed-off-by: David S. Miller <davem@davemloft.net>

1571e2fd

net: devlink: add warning for ndo_get_phys_port_name set when not needed · 746364f2

Jiri Pirko authored Mar 28, 2019

Currently if the driver registers devlink port instance, it should set
the devlink port attributes as well. Then the devlink core is able to
obtain physical port name itself, no need for driver to implement
the ndo. Once all drivers will implement devlink port registration,
this ndo should be removed. This warning guides new
drivers to do things as they should be done.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

746364f2

nfp: do not handle nn->port defined case in nfp_net_get_phys_port_name() · f1fa719c

Jiri Pirko authored Mar 28, 2019

If nn->port is defined it means that devlink_port has been registered
for this port as well. Devlink core is handling the port name
formatting.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f1fa719c

dsa: do not support ndo_get_phys_port_name for non-legacy ports · d484210b

Jiri Pirko authored Mar 28, 2019

Since each non-legacy slave has its own devlink port instance
correctly set, rely on devlink core to generate correct phys port name.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d484210b