Commits · 6d490f62a4c7f11c552591bdd08eda3636aa0db9 · nexedi / linux

10 Jun, 2016 19 commits

bgmac: Maintain some netdev statistics · 6d490f62

Florian Fainelli authored Jun 07, 2016

Add a few netdev statistics to report transmitted and received bytes and
packets and a few obvious errors.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6d490f62

bgmac: Add support for ethtool statistics · f6613d4f

Florian Fainelli authored Jun 07, 2016

Read the statistics from the BGMAC's builtin MAC and return them to
user-space using the standard ethtool helpers.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f6613d4f

bgmac: Bind net_device with backing device structure · 2022e9d5

Florian Fainelli authored Jun 07, 2016

In preparation for allowing different helpers to be utilized against
network devices created by the bgmac driver, make sure that we bind the
net_device with core->dev.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2022e9d5

virtio_net: Update the feature bit to comply with spec · 7d84e37e

Aaron Conole authored Jun 09, 2016

A draft version of the MTU Advice feature bit was specified as 25.  This
bit is not within the allowed range for network device feature bits, and
should be changed to be feature bit 3 to fully comply with the spec.

Fixes 14de9d11 ('virtio-net: Add initial MTU advice feature')
Signed-off-by: Aaron Conole <aconole@redhat.com>
Suggested-by: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7d84e37e

net: vrf: Fix crash when IPv6 is disabled at boot time · e4348637

David Ahern authored Jun 09, 2016

Frank Kellermann reported a kernel crash with 4.5.0 when IPv6 is
disabled at boot using the kernel option ipv6.disable=1. Using
current net-next with the boot option:

$ ip link add red type vrf table 1001

Generates:
[12210.919584] BUG: unable to handle kernel NULL pointer dereference at 0000000000000748
[12210.921341] IP: [<ffffffff814b30e3>] fib6_get_table+0x2c/0x5a
[12210.922537] PGD b79e3067 PUD bb32b067 PMD 0
[12210.923479] Oops: 0000 [#1] SMP
[12210.924001] Modules linked in: ipvlan 8021q garp mrp stp llc
[12210.925130] CPU: 3 PID: 1177 Comm: ip Not tainted 4.7.0-rc1+ #235
[12210.926168] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[12210.928065] task: ffff8800b9ac4640 ti: ffff8800bacac000 task.ti: ffff8800bacac000
[12210.929328] RIP: 0010:[<ffffffff814b30e3>]  [<ffffffff814b30e3>] fib6_get_table+0x2c/0x5a
[12210.930697] RSP: 0018:ffff8800bacaf888  EFLAGS: 00010202
[12210.931563] RAX: 0000000000000748 RBX: ffffffff81a9e280 RCX: ffff8800b9ac4e28
[12210.932688] RDX: 00000000000000e9 RSI: 0000000000000002 RDI: 0000000000000286
[12210.933820] RBP: ffff8800bacaf898 R08: ffff8800b9ac4df0 R09: 000000000052001b
[12210.934941] R10: 00000000657c0000 R11: 000000000000c649 R12: 00000000000003e9
[12210.936032] R13: 00000000000003e9 R14: ffff8800bace7800 R15: ffff8800bb3ec000
[12210.937103] FS:  00007faa1766c700(0000) GS:ffff88013ac00000(0000) knlGS:0000000000000000
[12210.938321] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12210.939166] CR2: 0000000000000748 CR3: 00000000b79d6000 CR4: 00000000000406e0
[12210.940278] Stack:
[12210.940603]  ffff8800bb3ec000 ffffffff81a9e280 ffff8800bacaf8c8 ffffffff814b3135
[12210.941818]  ffff8800bb3ec000 ffffffff81a9e280 ffffffff81a9e280 ffff8800bace7800
[12210.943040]  ffff8800bacaf8f0 ffffffff81397c88 ffff8800bb3ec000 ffffffff81a9e280
[12210.944288] Call Trace:
[12210.944688]  [<ffffffff814b3135>] fib6_new_table+0x24/0x8a
[12210.945516]  [<ffffffff81397c88>] vrf_dev_init+0xd4/0x162
[12210.946328]  [<ffffffff814091e1>] register_netdevice+0x100/0x396
[12210.947209]  [<ffffffff8139823d>] vrf_newlink+0x40/0xb3
[12210.948001]  [<ffffffff814187f0>] rtnl_newlink+0x5d3/0x6d5
...

The problem above is due to the fact that the fib hash table is not
allocated when IPv6 is disabled at boot.

As for the VRF driver it should not do any IPv6 initializations if IPv6
is disabled, so it needs to know if IPv6 is disabled at boot. The disable
parameter is private to the IPv6 module, so provide an accessor for
modules to determine if IPv6 was disabled at boot time.

Fixes: 35402e31 ("net: Add IPv6 support to VRF device")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e4348637

rxrpc: Simplify connect() implementation and simplify sendmsg() op · 2341e077

David Howells authored Jun 09, 2016

Simplify the RxRPC connect() implementation. It will just note the
destination address it is given, and if a sendmsg() comes along with no
address, this will be assigned as the address. No transport struct will be
held internally, which will allow us to remove this later.

Simplify sendmsg() also. Whilst a call is active, userspace refers to it
by a private unique user ID specified in a control message. When sendmsg()
sees a user ID that doesn't map to an extant call, it creates a new call
for that user ID and attempts to add it. If, when we try to add it, the
user ID is now registered, we now reject the message with -EEXIST. We
should never see this situation unless two threads are racing, trying to
create a call with the same ID - which would be an error.

It also isn't required to provide sendmsg() with an address - provided the
control message data holds a user ID that maps to a currently active call.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2341e077

net/netlink/af_netlink.h: Remove unused structure. · 21aff3b9

Fabien Siron authored Jun 07, 2016

Signed-off-by: Fabien Siron <fabien.siron@epita.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

21aff3b9

net/mlx4_en: mlx4_en_netpoll() should schedule TX, not RX · 7d71e994

Eric Dumazet authored Jun 03, 2016

I am not sure mlx4_en_netpoll() is doing anything useful right now.

mlx4 has different NAPI structures for RX and TX, and netpoll only wants
to drain TX queues.

Lets schedule NAPI polls on TX, not RX.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7d71e994

Merge branch 'BCM53xx-driver' · 23c731e8

David S. Miller authored Jun 09, 2016

Florian Fainelli says:

====================
net: dsa: Broadcom BCM53xx switches support

This patch series adds support for the Broadcom BCM53xx series aka RoboSwitches.

This driver is largely based on Jonas Gorski's b53 driver for OpenWrt which can
be found here:

https://dev.openwrt.org/browser/trunk/target/linux/generic/files/drivers/net/phy/b53

a few bug fixes and DSA-ifycation later, here is what we got.

This has been successfully tested in the following configurations:

- Broadcom BCM53011 using the SRAB bus layer with 4 ports LAN, 1 port WAN

- A Broadcom BCM7445 device with an internal Starfighter 2 switch (bcm_sf2.c)
  and a Broadcom BCM53125 hanging off one of its ports connected via MDIO, creating
  two trees hanging off each other, and this works!

- A Broadcom BCM53125 MDIO connected to a Lamobo/Bananapi R1 board using the STMMAC
  MDIO driver

For now, we do not enable Broadcom tags, because there are different
generations of switches being supported which have different tag formats, but
the plan is to enable them later on.

Support for different HW features will be added later: EEE, Compact Field
Processor (TCAM) once this initial cut gets accepted.

Testing and bug reports welcome!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

23c731e8

net: dsa: b53: Plug in VLAN support · a2482d2c

Florian Fainelli authored Jun 09, 2016

Add support for configuration VLANs on B53 devices by implementing the
port VLAN add/del/dump functions. We currently default to a behavior
which is equivalent to having VLAN filtering turned on, where all VLANs
not programmed into the VLAN port-based vector will be discarded on
ingress.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a2482d2c

net: dsa: b53: Add bridge support · ff39c2d6

Florian Fainelli authored Jun 09, 2016

Add support for HW bridging by tying the ports together in the same port
VLAN mask when they belong to the same bridge, and isolating them to be
alone with the CPU port when they are not.

Propagate STP states from the bridge layer to the switch's HW mapping
when requested.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ff39c2d6

net: dsa: b53: Implement ARL add/del/dump operations · 1da6df85

Florian Fainelli authored Jun 09, 2016

Adds support for FDB add/delete/dump using the ARL read/write logic and
the ARL search logic for faster dumps. The code is made flexible enough
it could support devices with a different register layout like BCM5325
and BCM5365 which have fewer number of entries or pack values into a
single 64 bits register.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1da6df85

net: dsa: b53: Add BCM7445 quirk · 0830c980

Florian Fainelli authored Jun 09, 2016

The Broadcom BCM7445 STB chip has an issued in its revision D0 which was
previously worked around in drivers/net/dsa/bcm_sf2.c where we may
end-up double programming the integrated BCM7445 switch (bcm_sf2) and an
external Broadcom switch such as BCM53125, since these are mostly
register compatible.

Add a small quirk which just defers probing until we are sitting on the
slave DSA MDIO bus, which will allow us to intercept reads/writes and
funnel them through the SF2 internal MDIO master (which happens to
disconnect its pseudo PHY).
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0830c980

net: dsa: b53: Add support for Broadcom RoboSwitch · 967dd82f

Florian Fainelli authored Jun 09, 2016

This patch adds support for Broadcom's BCM53xx switch family, also known
as RoboSwitch. Some of these switches are ubiquituous, found in home
routers, Wi-Fi routers, DSL and cable modem gateways and other
networking related products.

This drivers adds the library driver (b53_common.c) as well as a few bus
glue drivers for MDIO, SPI, Switch Register Access Block (SRAB) and
memory-mapped I/O into a SoC's address space (Broadcom BCM63xx/33xx).

Basic operations are supported to bring the Layer 1/2 up and running,
but not much more at this point, subsequent patches add the remaining
features.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

967dd82f

Merge branch 'bcm_sf2-vlan' · 409a5f27

David S. Miller authored Jun 09, 2016

Florian Fainelli says:

====================
net: dsa: bcm_sf2: add VLAN support

This is long overdue, finally add support for VLANs in the Broadcom Starfigther
2 switch driver.

There are a few things that make us differ from e.g; mv88e6xxx.c:

- we keep a software cache of which VLANs are enabled and which are not to
  dramatically speed up the VLAN dump operation, we do not have any HW operation
  which would only return the list of valid VLAN entries, they would have to be
  all queried one by one, with 4K vlans, this takes a while

- the default behavior is equivalent to setting VLAN filtering to 1, still working
  on implementing a proper port_vlan_filtering callback, but I figured the most
  conservative behavior is probably okay anyway

- without enabling VLANs, the default behavior is to receive any 802.1q frames
  (per the DSA documentation), however, once we start enabling VLAN support, if
  an interface leaves the bridge, we still want it to receive all 802.1q frames
  so we utiliez the "Join all VLAN" feature of the switch to perform that
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

409a5f27

net: dsa: bcm_sf2: Add VLAN support · 9c57a771

Florian Fainelli authored Jun 09, 2016

Add support for configuring VLANs on the Broadcom Starfigther2 switch.
This is all done through the bridge vlan facility just like other DSA
drivers.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9c57a771

net: dsa: bcm_sf2: Add VLAN registers definitions · 064523ff

Florian Fainelli authored Jun 09, 2016

Add the definitions for the VLAN registers that we are going to
manipulate in subsequent patches.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

064523ff

net: dsa: bcm_sf2: Move setup function at the far end · 7fbb1a92

Florian Fainelli authored Jun 09, 2016

Re-order the bcm_sf2_sw_setup() function so that it is at the far end of
the driver to avoid any kind of forward declarations.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

7fbb1a92

net: dsa: bcm_sf2: Split fast age into a helper function · a468ef45

Florian Fainelli authored Jun 09, 2016

Add a helper function to fast age something that is controlled by the
caller: port, VLAN. We will use this to implement a VLAN fast age
operation.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a468ef45

09 Jun, 2016 15 commits

Merge branch 'netdev_lockdep_set_classes' · cf515802

David S. Miller authored Jun 09, 2016

Eric Dumazet says:

====================
net: better lockdep annotations

Introduction of qdisc->running seqcount added lockdep false positives.

While chasing the bug, it came to me that we had a lot of copies of the
same stuff in virtual drivers.

This patch series has the qdisc->running fix (considers that a trylock
is attempted in lockdep terminology), and adds a generic helper so
that we no longer have to patch many virtual drivers when a new per-device
or per-qdisc lock is added.

Thanks to David Ahern for reporting the issue and testing my patches :)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

cf515802

net: ipvlan: call netdev_lockdep_set_classes() · 0d7dd798

Eric Dumazet authored Jun 09, 2016

In case a qdisc is used on a ipvlan device, we need to use different
lockdep classes to avoid false positives.

Use the new netdev_lockdep_set_classes() generic helper.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0d7dd798

net: macvlan: call netdev_lockdep_set_classes() · 24ffd752

Eric Dumazet authored Jun 09, 2016

In case a qdisc is used on a macvlan device, we need to use different
lockdep classes to avoid false positives.

Use the new netdev_lockdep_set_classes() generic helper.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

24ffd752

net: vrf: call netdev_lockdep_set_classes() · 78e7a2ae

Eric Dumazet authored Jun 09, 2016

In case a qdisc is used on a vrf device, we need to use different
lockdep classes to avoid false positives.

Use the new netdev_lockdep_set_classes() generic helper.
Reported-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

78e7a2ae

net: add netdev_lockdep_set_classes() helper · d3fff6c4

Eric Dumazet authored Jun 09, 2016

It is time to add netdev_lockdep_set_classes() helper
so that lockdep annotations per device type are easier to manage.

This removes a lot of copies and missing annotations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d3fff6c4

net: sched: fix qdisc->running lockdep annotations · 52fbb290

Eric Dumazet authored Jun 09, 2016

1) qdisc_run_begin() is really using the equivalent of a trylock.
  Instead of using write_seqcount_begin(), use a combination of
  raw_write_seqcount_begin() and correct lockdep annotation.

2) sch_direct_xmit() should use regular spin_lock(root_lock)

Fixes: f9eb8aea ("net_sched: transform qdisc running bit into a seqcount")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

52fbb290

netvsc: get rid of completion timeouts · 5362855a

Vitaly Kuznetsov authored Jun 09, 2016

I'm hitting 5 second timeout in rndis_filter_set_rss_param() while setting
RSS parameters for the device. When this happens we end up returning
-ETIMEDOUT from the function and rndis_filter_device_add() falls back to
setting

        net_device->max_chn = 1;
        net_device->num_chn = 1;
        net_device->num_sc_offered = 0;

but after a moment the rndis request succeeds and subchannels start to
appear. netvsc_sc_open() does unconditional nvscdev->num_sc_offered-- and
it becomes U32_MAX-1. Consequent rndis_filter_device_remove() will hang
while waiting for all U32_MAX-1 subchannels to appear and this is not
going to happen.

The immediate issue could be solved by adding num_sc_offered > 0 check to
netvsc_sc_open() but we're getting out of sync with the host and it's not
easy to adjust things later, e.g. in this particular case we'll be creating
queues without a user request for it and races are expected. Same applies
to other parts of the driver which have the same completion timeout.

Following the trend in drivers/hv/* code I suggest we remove all these
timeouts completely. As a guest we can always trust the host we're running
on and if the host screws things up there is no easy way to recover anyway.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5362855a

sit: remove unnecessary protocol check in ipip6_tunnel_xmit() · adba931f

Simon Horman authored Jun 09, 2016

ipip6_tunnel_xmit() is called immediately after checking that
skb->protocol is  htons(ETH_P_IPV6) so there is no need
to check it a second time.

Found by inspection.
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

adba931f

Merge branch 'cbq-kill-drop' · b8d99ba0

David S. Miller authored Jun 08, 2016

Florian Westphal says:

====================
sched, cbq: remove OVL_STRATEGY/POLICE support

iproute2 does not implement any options that result in the
TCA_CBQ_OVL_STRATEGY/TCA_CBQ_POLICE attributes being set/used.

This series removes these two attributes from cbq and makes kernel reject
 them via EOPNOTSUPP in case they are present.

The two followup changes then remove several features from qdisc
infrastructure that are then no longer used/needed.  These are:
 - The 'drop' method provided by most qdiscs
 - the 'reshape_fail' function used by some qdiscs
 - the __parent member in struct Qdisc

I tested this with allmod and allyesconfig builds and also with
a brief cbq script:

  tc qdisc add dev eth0 root handle 1:0 cbq bandwidth 10Mbit avpkt 1000 cell 8
  tc class add dev eth0 parent 1:0 classid 1:1 est 1sec 8sec cbq bandwidth 10Mbit rate 5Mbit prio 1 allot 1514 maxburst 20 cell 8 avpkt 1000 bounded split 1:0 defmap 3f
  tc class add dev eth0 parent 1:0 classid 1:2 est 1sec 8sec cbq bandwidth 10Mbit rate 5Mbit prio 1 allot 1514 maxburst 20 cell 8 avpkt 1000 bounded split 1:0 defmap 3f
  tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip tos 0x10 0xff classid 1:1 police rate 2Mbit burst 10K reclassify
  tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip tos 0x0c 0xff classid 1:2
  tc filter add dev eth0 parent 1:0 protocol ip prio 2 u32 match ip tos 0x10 0xff classid 1:2
  tc filter add dev eth0 parent 1:0 protocol ip prio 3 u32 match ip tos 0x0 0x0 classid 1:2

No changes since v1 except patch #5 to fix up struct Qdisc layout.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

b8d99ba0

sched: place state, next_sched and gso_skb in same cacheline again · c8945043

Florian Westphal authored Jun 09, 2016

Earlier commits removed two members from struct Qdisc which places
next_sched/gso_skb into a different cacheline than ->state.

This restores the struct layout to what it was before the removal.
Move the two members, then add an annotation so they all reside in the
same cacheline.

This adds a 16 byte hole after cpu_qstats.

The hole could be closed but as it doesn't decrease total struct size just
do it this way.
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

c8945043

sched: remove qdisc->drop · a09ceb0e

Florian Westphal authored Jun 09, 2016

after removal of TCA_CBQ_OVL_STRATEGY from cbq scheduler, there are no
more callers of ->drop() outside of other ->drop functions, i.e.
nothing calls them.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

a09ceb0e

sched: remove qdisc_rehape_fail · c3a173d7

Florian Westphal authored Jun 09, 2016

After the removal of TCA_CBQ_POLICE in cbq scheduler qdisc->reshape_fail
is always NULL, i.e. qdisc_rehape_fail is now the same as qdisc_drop.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

c3a173d7

cbq: remove TCA_CBQ_POLICE support · dd47c1fa

Florian Westphal authored Jun 09, 2016

iproute2 doesn't implement any cbq option that results in this attribute
being sent to kernel.

To make use of it, user would have to

- patch iproute2
- add a class
- attach a qdisc to the class (default pfifo doesn't work as
  q->handle is 0 and cbq_set_police() is a no-op in this case)
- re-'add' the same class (tc class change ...) again
- user must also specifiy a defmap (e.g. 'split 1:0 defmap 3f'), since
  this 'police' feature relies on its presence
- the added qdisc must be one of bfifo, pfifo or netem

If all of these conditions are met and _some_ leaf qdiscs, namely
p/bfifo, netem, plug or tbf would drop a packet, kernel calls back into
cbq, which will attempt to re-queue the skb into a different class
as indicated by the parents' defmap entry for TC_PRIO_BESTEFFORT.

[ i.e. we behave as if tc_classify returned TC_ACT_RECLASSIFY ].

This feature, which isn't documented or implemented in iproute2,
and isn't implemented consistently (most qdiscs like sfq, codel, etc
drop right away instead of attempting this reclassification) is the
sole reason for the reshape_fail and __parent member in Qdisc struct.

So remove TCA_CBQ_POLICE support from the kernel, reject it via EOPNOTSUPP
so userspace knows we don't support it, and then remove no-longer needed
infrastructure in followup commit.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

dd47c1fa

cbq: remove TCA_CBQ_OVL_STRATEGY support · c3498d34

Florian Westphal authored Jun 09, 2016

since initial revision of cbq in 2004 iproute 2 has never implemented
support for TCA_CBQ_OVL_STRATEGY, which is what needs to be set to
activate the class->drop() call (TC_CBQ_OVL_DROP strategy must be
set by userspace value must be set by userspace).

David Miller says:
   It seems really safe to kill this thing off, flag an error if someone
   tries to set the attribute, and therefore kill off all of the
   non-default cbq_ovl_*() functions.

A followup commit can then remove all .drop qdisc methods since this
removed the only caller.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

c3498d34

ip6gre: Allow live link address change · 76e48f9f

Shweta Choudaha authored Jun 08, 2016

The ip6 GRE tap device should not be forced to down state to change
the mac address and should allow live address change for tap device
similar to ipv4 gre.
Signed-off-by: Shweta Choudaha <schoudah@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

76e48f9f

08 Jun, 2016 6 commits

Merge branch 'vrf-fib-rule-improve' · 753c104b

David S. Miller authored Jun 08, 2016

David Ahern says:

====================
net: vrf: Improve use of FIB rules

Currently, VRFs require 1 oif and 1 iif rule per address family per
VRF. As the number of VRF devices increases it brings scalability
issues with the increasing rule list. All of the VRF rules have the
same format with the exception of the specific table id to direct the
lookup. Since the table id is available from the oif or iif in the
loopup, the VRF rules can be consolidated to a single rule that pulls
the table from the VRF device.

This solution still allows a user to insert their own rules for VRFs,
including rules with additional attributes. Accordingly, it is backwards
compatible with existing setups and allows other policy routing as
desired.

Hopefully v5 is the charm; my e-waste can is getting full.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

753c104b

net: vrf: Add l3mdev rules on first device create · 1aa6c4f6

David Ahern authored Jun 08, 2016

Add l3mdev rule per address family when the first VRF device is
created. The rules are installed with a default preference of 1000.
Users can replace the default rule as desired.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1aa6c4f6

net: Add l3mdev rule · 96c63fa7

David Ahern authored Jun 08, 2016

Currently, VRFs require 1 oif and 1 iif rule per address family per
VRF. As the number of VRF devices increases it brings scalability
issues with the increasing rule list. All of the VRF rules have the
same format with the exception of the specific table id to direct the
lookup. Since the table id is available from the oif or iif in the
loopup, the VRF rules can be consolidated to a single rule that pulls
the table from the VRF device.

This patch introduces a new rule attribute l3mdev. The l3mdev rule
means the table id used for the lookup is pulled from the L3 master
device (e.g., VRF) rather than being statically defined. With the
l3mdev rule all of the basic VRF FIB rules are reduced to 1 l3mdev
rule per address family (IPv4 and IPv6).

If an admin wishes to insert higher priority rules for specific VRFs
those rules will co-exist with the l3mdev rule. This capability means
current VRF scripts will co-exist with this new simpler implementation.

Currently, the rules list for both ipv4 and ipv6 look like this:
    $ ip  ru ls
    1000:       from all oif vrf1 lookup 1001
    1000:       from all iif vrf1 lookup 1001
    1000:       from all oif vrf2 lookup 1002
    1000:       from all iif vrf2 lookup 1002
    1000:       from all oif vrf3 lookup 1003
    1000:       from all iif vrf3 lookup 1003
    1000:       from all oif vrf4 lookup 1004
    1000:       from all iif vrf4 lookup 1004
    1000:       from all oif vrf5 lookup 1005
    1000:       from all iif vrf5 lookup 1005
    1000:       from all oif vrf6 lookup 1006
    1000:       from all iif vrf6 lookup 1006
    1000:       from all oif vrf7 lookup 1007
    1000:       from all iif vrf7 lookup 1007
    1000:       from all oif vrf8 lookup 1008
    1000:       from all iif vrf8 lookup 1008
    ...
    32765:      from all lookup local
    32766:      from all lookup main
    32767:      from all lookup default

With the l3mdev rule the list is just the following regardless of the
number of VRFs:
    $ ip ru ls
    1000:       from all lookup [l3mdev table]
    32765:      from all lookup local
    32766:      from all lookup main
    32767:      from all lookup default

(Note: the above pretty print of the rule is based on an iproute2
       prototype. Actual verbage may change)
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

96c63fa7

Merge branch 'tipc-small-fixes' · 6278e03d

David S. Miller authored Jun 08, 2016

Jon Maloy says:

====================
tipc: two small fixes

We fix a couple of rarely seen anomalies discovered during testing.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

6278e03d

tipc: change node timer unit from jiffies to ms · 5ca509fc

Jon Paul Maloy authored Jun 08, 2016

The node keepalive interval is recalculated at each timer expiration
to catch any changes in the link tolerance, and stored in a field in
struct tipc_node. We use jiffies as unit for the stored value.

This is suboptimal, because it makes the calculation unnecessary
complex, including two unit conversions. The conversions also lead to
a rounding error that causes the link "abort limit" to be 3 in the
normal case, instead of 4, as intended. This again leads to unnecessary
link resets when the network is pushed close to its limit, e.g., in an
environment with hundreds of nodes or namesapces.

In this commit, we do instead let the keepalive value be calculated and
stored in milliseconds, so that there is only one conversion and the
rounding error is eliminated.

We also remove a redundant "keepalive" field in struct tipc_link. This
is remnant from the previous implementation.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

5ca509fc

tipc: correct error in node fsm · c4282ca7

Jon Paul Maloy authored Jun 08, 2016

commit 88e8ac70 ("tipc: reduce transmission rate of reset messages
when link is down") revealed a flaw in the node FSM, as defined in
the log of commit 66996b6c ("tipc: extend node FSM").

We see the following scenario:
1: Node B receives a RESET message from node A before its link endpoint
   is fully up, i.e., the node FSM is in state SELF_UP_PEER_COMING. This
   event will not change the node FSM state, but the (distinct) link FSM
   will move to state RESETTING.
2: As an effect of the previous event, the local endpoint on B will
   declare node A lost, and post the event SELF_DOWN to the its node
   FSM. This moves the FSM state to SELF_DOWN_PEER_LEAVING, meaning
   that no messages will be accepted from A until it receives another
   RESET message that confirms that A's endpoint has been reset. This
   is  wasteful, since we know this as a fact already from the first
   received RESET, but worse is that the link instance's FSM has not
   wasted this information, but instead moved on to state ESTABLISHING,
   meaning that it repeatedly sends out ACTIVATE messages to the reset
   peer A.
3: Node A will receive one of the ACTIVATE messages, move its link FSM
   to state ESTABLISHED, and start repeatedly sending out STATE messages
   to node B.
4: Node B will consistently drop these messages, since it can only accept
   accept a RESET according to its node FSM.
5: After four lost STATE messages node A will reset its link and start
   repeatedly sending out RESET messages to B.
6: Because of the reduced send rate for RESET messages, it is very
   likely that A will receive an ACTIVATE (which is sent out at a much
   higher frequency) before it gets the chance to send a RESET, and A
   may hence quickly move back to state ESTABLISHED and continue sending
   out STATE messages, which will again be dropped by B.
7: GOTO 5.
8: After having repeated the cycle 5-7 a number of times, node A will
   by chance get in between with sending a RESET, and the situation is
   resolved.

Unfortunately, we have seen that it may take a substantial amount of
time before this vicious loop is broken, sometimes in the order of
minutes.

We correct this by making a small correction to the node FSM: When a
node in state SELF_UP_PEER_COMING receives a SELF_DOWN event, it now
moves directly back to state SELF_DOWN_PEER_DOWN, instead of as now
SELF_DOWN_PEER_LEAVING. This is logically consistent, since we don't
need to wait for RESET confirmation from of an endpoint that we alread
know has been reset. It also means that node B in the scenario above
will not be dropping incoming STATE messages, and the link can come up
immediately.

Finally, a symmetry comparison reveals that the  FSM has a similar
error when receiving the event PEER_DOWN in state PEER_UP_SELF_COMING.
Instead of moving to PERR_DOWN_SELF_LEAVING, it should move directly
to SELF_DOWN_PEER_DOWN. Although we have never seen any negative effect
of this logical error, we choose fix this one, too.

The node FSM looks as follows after those changes:

                           +----------------------------------------+
                           |                           PEER_DOWN_EVT|
                           |                                        |
  +------------------------+----------------+                       |
  |SELF_DOWN_EVT           |                |                       |
  |                        |                |                       |
  |              +-----------+          +-----------+               |
  |              |NODE_      |          |NODE_      |               |
  |   +----------|FAILINGOVER|<---------|SYNCHING   |-----------+   |
  |   |SELF_     +-----------+ FAILOVER_+-----------+   PEER_   |   |
  |   |DOWN_EVT   |          A BEGIN_EVT  A         |   DOWN_EVT|   |
  |   |           |          |            |         |           |   |
  |   |           |          |            |         |           |   |
  |   |           |FAILOVER_ |FAILOVER_   |SYNCH_   |SYNCH_     |   |
  |   |           |END_EVT   |BEGIN_EVT   |BEGIN_EVT|END_EVT    |   |
  |   |           |          |            |         |           |   |
  |   |           |          |            |         |           |   |
  |   |           |         +--------------+        |           |   |
  |   |           +-------->|   SELF_UP_   |<-------+           |   |
  |   |   +-----------------|   PEER_UP    |----------------+   |   |
  |   |   |SELF_DOWN_EVT    +--------------+   PEER_DOWN_EVT|   |   |
  |   |   |                    A        A                   |   |   |
  |   |   |                    |        |                   |   |   |
  |   |   |         PEER_UP_EVT|        |SELF_UP_EVT        |   |   |
  |   |   |                    |        |                   |   |   |
  V   V   V                    |        |                   V   V   V
+------------+       +-----------+    +-----------+       +------------+
|SELF_DOWN_  |       |SELF_UP_   |    |PEER_UP_   |       |PEER_DOWN   |
|PEER_LEAVING|       |PEER_COMING|    |SELF_COMING|       |SELF_LEAVING|
+------------+       +-----------+    +-----------+       +------------+
       |               |       A        A       |                |
       |               |       |        |       |                |
       |       SELF_   |       |SELF_   |PEER_  |PEER_           |
       |       DOWN_EVT|       |UP_EVT  |UP_EVT |DOWN_EVT        |
       |               |       |        |       |                |
       |               |       |        |       |                |
       |               |    +--------------+    |                |
       |PEER_DOWN_EVT  +--->|  SELF_DOWN_  |<---+   SELF_DOWN_EVT|
       +------------------->|  PEER_DOWN   |<--------------------+
                            +--------------+
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c4282ca7