Commits · 3e9b27b8fc8b00c9edaaa5fd64636e6f2b331f43 · Kirill Smelkov / linux

01 Mar, 2016 28 commits

mlxsw: spectrum: Unmap local port from module during teardown · 3e9b27b8

Ido Schimmel authored Feb 26, 2016

When splitting a port we replace it with 2 or 4 other ports. To be able
to do that we need to remove the original port netdev and unmap it from
its module. However, we first mark it as disabled, as active ports
cannot be unmapped.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3e9b27b8

mlxsw: core: Add devlink port splitter callbacks · 284ef803

Jiri Pirko authored Feb 26, 2016

Add middle layer in mlxsw core code to forward port split/unsplit calls
into specific ASIC drivers.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

284ef803

mlxsw: Implement devlink interface · c4745500

Jiri Pirko authored Feb 26, 2016

Implement newly introduced devlink interface. Add devlink port instances
for every port and set the port types accordingly.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c4745500

mlx4: Implement port type setting via devlink interface · b2facd95

Jiri Pirko authored Feb 26, 2016

So far, there has been an mlx4-specific sysfs file allowing user to
change port type to either Ethernet of InfiniBand. This is very
inconvenient.

Allow to expose the same ability to set port type in a generic way
using devlink interface.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

b2facd95

mlx4: Implement devlink interface · 09d4d087

Jiri Pirko authored Feb 26, 2016

Implement newly introduced devlink interface. Add devlink port instances
for every port and set the port types accordingly.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
v2->v3:
-add dev param to devlink_register (api change)
Signed-off-by: David S. Miller <davem@davemloft.net>

09d4d087

Introduce devlink infrastructure · bfcd3a46

Jiri Pirko authored Feb 26, 2016

Introduce devlink infrastructure for drivers to register and expose to
userspace via generic Netlink interface.

There are two basic objects defined:
devlink - one instance for every "parent device", for example switch ASIC
devlink port - one instance for every physical port of the device.

This initial portion implements basic get/dump of objects to userspace.
Also, port splitter and port type setting is implemented.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bfcd3a46

Merge branch 'tc-sw-only' · bd070e21

David S. Miller authored Mar 01, 2016

John Fastabend says:

====================
tc software only

This adds a software only flag to tc but incorporates a bunch of comments
from the original attempt at this.

First instead of having the offload decision logic be embedded in cls_u32
I lifted into cls_pkt.h so it can be used anywhere and named the flag
TCA_CLS_FLAGS_SKIP_HW (Thanks Jiri ;)

In order to do this I put the flag defines in pkt_cls.h as well. However
it was suggested that perhaps these flags could be lifted into the
upper layer of TCA_ as well but I'm afraid this can not be done with
existing tc design as far as I can tell. The problem is the filters are
packed and unpacked in the classifier specific code and pushing the flags
through the high level doesn't seem easily doable. And we already have
this design where classifiers handle generic options such as actions and
policers. So I think adding one more thing here is OK as 'tc', et. al.
already know how to handle this type of thing.
====================
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bd070e21

net: sched: cls_u32 add bit to specify software only rules · 9e8ce79c

John Fastabend authored Feb 26, 2016

In the initial implementation the only way to stop a rule from being
inserted into the hardware table was via the device feature flag.
However this doesn't work well when working on an end host system
where packets are expect to hit both the hardware and software
datapaths.

For example we can imagine a rule that will match an IP address and
increment a field. If we install this rule in both hardware and
software we may increment the field twice. To date we have only
added support for the drop action so we have been able to ignore
these cases. But as we extend the action support we will hit this
example plus more such cases. Arguably these are not even corner
cases in many working systems these cases will be common.

To avoid forcing the driver to always abort (i.e. the above example)
this patch adds a flag to add a rule in software only. A careful
user can use this flag to build software and hardware datapaths
that work together. One example we have found particularly useful
is to use hardware resources to set the skb->mark on the skb when
the match may be expensive to run in software but a mark lookup
in a hash table is cheap. The idea here is hardware can do in one
lookup what the u32 classifier may need to traverse multiple lists
and hash tables to compute. The flag is only passed down on inserts.
On deletion to avoid stale references in hardware we always try
to remove a rule if it exists.

The flags field is part of the classifier specific options. Although
it is tempting to lift this into the generic structure doing this
proves difficult do to how the tc netlink attributes are implemented
along with how the dump/change routines are called. There is also
precedence for putting seemingly generic pieces in the specific
classifier options such as TCA_U32_POLICE, TCA_U32_ACT, etc. So
although not ideal I've left FLAGS in the u32 options as well as it
simplifies the code greatly and user space has already learned how
to manage these bits ala 'tc' tool.

Another thing if trying to update a rule we require the flags to
be unchanged. This is to force user space, software u32 and
the hardware u32 to keep in sync. Thanks to Simon Horman for
catching this case.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

9e8ce79c

net: cls_u32: move TC offload feature bit into cls_u32 offload logic · 2b6ab0d3

John Fastabend authored Feb 26, 2016

In the original series drivers would get offload requests for cls_u32
rules even if the feature bit is disabled. This meant the driver had
to do a boiler plate check on the feature bit before adding/deleting
the rule.

This patch lifts the check into the core code and removes it from the
driver specific case.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2b6ab0d3

net: sched: consolidate offload decision in cls_u32 · 6843e7a2

John Fastabend authored Feb 26, 2016

The offload decision was originally very basic and tied to if the dev
implemented the appropriate ndo op hook. The next step is to allow
the user to more flexibly define if any paticular rule should be
offloaded or not. In order to have this logic in one function lift
the current check into a helper routine tc_should_offload().
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

6843e7a2

Merge branch 'ndo_set_rx_headroom' · d2e42a17

David S. Miller authored Mar 01, 2016

Paolo Abeni says:

====================
bridge/ovs: avoid skb head copy on frame forwarding

Currently, while when an OVS or Linux bridge is used to forward frames towards
some tunnel device, a skb_head_copy() may occur if the ingress device do not
provide enough headroom for the tx encapsulation.

This patch series tries to address the issue implementing a new ndo operation to
allow the master device to control the headroom used when allocating the skb on
frame reception.

Said operation is used by the Linux bridge to notify the bridged ports of
needed_headroom changes, and similar bookkeeping and behaviour is also added to
openvswitch, on a per datapath basis.

Finally, the operation is implemented for veth and tun device, which give
performance improvement in the 6-12% range when forwarding frames from said
devices towards a vxlan tunnel.

v2:
- fix netdev_get_fwd_headroom() behaviour
- remove some code duplication with the netdev_set_rx_headroom() and
   netdev_reset_rx_headroom() helpers
- handle headroom reset on [v]port removal/deletion
- initialize tun align to the old default value

v3:
- fix a comment typo
====================
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

d2e42a17

veth: implement ndo_set_rx_headroom · 163e5292

Paolo Abeni authored Feb 26, 2016

The rx headroom for veth dev is the peer device needed_headroom.
Avoid ping-pong updates setting the private flag IFF_PHONY_HEADROOM.

This avoids skb head reallocation when forwarding from a veth dev
towards a device adding some kind of encapsulation.

When transmitting frames below the MTU size towards a vxlan device,
this gives about 10% performance speed-up when OVS is used to connect
the veth and the vxlan device and a little more when using a
plain Linux bridge.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

163e5292

net/tun: implement ndo_set_rx_headroom · eaea34b2

Paolo Abeni authored Feb 26, 2016

ndo_set_rx_headroom controls the align value used by tun devices to
allocate skbs on frame reception.
When the xmit device adds a large encapsulation, this avoids an skb
head reallocation on forwarding.

The measured improvement when forwarding towards a vxlan dev with
frame size below the egress device MTU is as follow:

vxlan over ipv6, bridged: +6%
vxlan over ipv6, ovs: +7%

In case of ipv4 tunnels there is no improvement, since the tun
device default alignment provides enough headroom to avoid the skb
head reallocation.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

eaea34b2

ovs: propagate per dp max headroom to all vports · 3a927bc7

Paolo Abeni authored Feb 26, 2016

This patch implements bookkeeping support to compute the maximum
headroom for all the devices in each datapath. When said value
changes, the underlying devs are notified via the
ndo_set_rx_headroom method.

This also increases the internal vports xmit performance.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3a927bc7

bridge: notify enslaved devices of headroom changes · 45493d47

Paolo Abeni authored Feb 26, 2016

On bridge needed_headroom changes, the enslaved devices are
notified via the ndo_set_rx_headroom method
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

45493d47

netdev: introduce ndo_set_rx_headroom · 871b642a

Paolo Abeni authored Feb 26, 2016

This method allows the controlling device (i.e. the bridge) to specify
additional headroom to be allocated for skb head on frame reception.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

871b642a

Merge branch 'bnxt_en-next' · 46d5efa9

David S. Miller authored Mar 01, 2016

Michael Chan says:

====================
bnxt_en: updates for net-next.

Miscellaneous updates covering SRIOV, IRQ coalescing, firmware logging and
package version for net-next.  Thanks.

v2: Updated description and added more comments for patch 1.  Fixed
function parameters formatting for patch 4.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

46d5efa9

bnxt_en: Add hwrm_send_message_silent(). · 90e20921

Michael Chan authored Feb 26, 2016

This is used to send NVM_FIND_DIR_ENTRY messages which can return error
if the entry is not found.  This is normal and the error message will
cause unnecessary alarm, so silence it.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

90e20921

bnxt_en: Refactor _hwrm_send_message(). · fbfbc485

Michael Chan authored Feb 26, 2016

Add a new function bnxt_do_send_msg() to do essentially the same thing
with an additional paramter to silence error response messages.  All
current callers will set silent to false.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fbfbc485

bnxt_en: Add installed-package firmware version reporting via Ethtool GDRVINFO · 3ebf6f0a

Rob Swindell authored Feb 26, 2016

For everything to fit, we remove the PHY microcode version and replace it
with the firmware package version in the fw_version string.
Signed-off-by: Rob Swindell <swindell@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3ebf6f0a

bnxt_en: Fix dmesg log firmware error messages. · a8643e16

Michael Chan authored Feb 26, 2016

Use appropriate firmware request header structure to prepare the
firmware messages.  This avoids the unnecessary conversion of the
fields to 32-bit fields.  Add appropriate endian conversion when
printing out the message fields in dmesg so that they appear correct
in the log.
Reported-by: Rob Swindell <swindell@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

a8643e16

bnxt_en: Use firmware provided message timeout value. · ff4fe81d

Michael Chan authored Feb 26, 2016

Before this patch, we used a hardcoded value of 500 msec as the default
value for firmware message response timeout.  For better portability with
future hardware or debug platforms, use the value provided by firmware in
the first response and store it for all susequent messages.  Redefine the
macro HWRM_CMD_TIMEOUT to the stored value.  Since we don't have the
value yet in the first message, use the 500 ms default if the stored value
is zero.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ff4fe81d

bnxt_en: Add coalescing support for tx rings. · dfc9c94a

Michael Chan authored Feb 26, 2016

When tx and rx rings don't share the same completion ring, tx coalescing
parameters can be set differently from the rx coalescing parameters.
Otherwise, use rx coalescing parameters on shared completion rings.

Adjust rx coalescing default values to lower interrupt rate.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dfc9c94a

bnxt_en: Refactor bnxt_hwrm_set_coal(). · bb053f52

Michael Chan authored Feb 26, 2016

Add a function to set all the coalescing parameters.  The function can
be used later to set both rx and tx coalescing parameters.

v2: Fixed function parameters formatting requested by DaveM.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bb053f52

bnxt_en: Store irq coalescing timer values in micro seconds. · dfb5b894

Michael Chan authored Feb 26, 2016

Don't convert these to internal hardware tick values before storing
them.  This avoids the confusion of ethtool -c returning slightly
different values than the ones set using ethtool -C when we convert
hardware tick values back to micro seconds.  Add better comments for
the hardware settings.

Also, rename the current set of coalescing fields with rx_ prefix.
The next patch will add support of tx coalescing values.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dfb5b894

bnxt_en: Send PF driver unload notification to all VFs. · 19241368

Jeffrey Huang authored Feb 26, 2016

During remove_one() when SRIOV is enabled, the PF driver
should broadcast PF driver unload notification to all
VFs that are attached to VMs. Upon receiving the PF
driver unload notification, the VF driver should print
a warning message to message log.  Certain operations on the
VF may not succeed after the PF has unloaded.
Signed-off-by: Jeffrey Huang <huangjw@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

19241368

bnxt_en: Improve bnxt_vf_update_mac(). · 3874d6a8

Jeffrey Huang authored Feb 26, 2016

Allow the VF to setup its own MAC address if the PF has not administratively
set it for the VF.  To do that, we should always store the MAC address
from the firmware.  There are 2 cases:

1. The MAC address is valid.  This MAC address is assigned by the PF and
it needs to override the current VF MAC address.

2. The MAC address is zero.  The VF will use a random MAC address by default.
By storing this 0 MAC address in the VF structure, it will allow the VF
user to change the MAC address later using ndo_set_mac_address() when
it sees that the stored MAC address is 0.

v2: Expanded descriptions and added more comments.
Signed-off-by: Jeffrey Huang <huangjw@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

3874d6a8

Merge tag 'linux-can-next-for-4.6-20160226' of... · 0c92c949

David S. Miller authored Mar 01, 2016

Merge tag 'linux-can-next-for-4.6-20160226' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2016-02-26

this is a pull request of 3 patch for net-next/master.

There are two patches by Simon Horman, in which the device tree support
for the rcar_can driver is improved. One patch by me fixes the bad
coding style of the ems_usb driver which was introduced recently.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

0c92c949

29 Feb, 2016 12 commits

Merge branch 'lan78xx-next' · 136065e6

David S. Miller authored Feb 29, 2016

Woojung Huh says:

====================
lan78xx: driver update

This patch series add new ethtool functions of set_pauseparam  & get_pauseparam
and MAINTAINERS entry.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

136065e6

MAINTAINERS: Add LAN78XX entry · 146498ea

Woojung.Huh@microchip.com authored Feb 25, 2016

Add maintainers for Microchip LAN78XX.
UNGLinuxDriver@microchip.com is alias email which goes to current
developers work for Microchip Network related products.
Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

146498ea

lan78xx: add ethtool set & get pause functions · 349e0c5e

Woojung.Huh@microchip.com authored Feb 25, 2016

Add ethtool operations of set_pauseram and get_pauseparm.
Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

349e0c5e

lan78xx: remove unnecessary code · e270b2db

Woojung.Huh@microchip.com authored Feb 25, 2016

It is not required after commit cd772de3
("phy: keep pause flags in phy driver features")
Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e270b2db

lan78xx: replace devid to chipid & chiprev · 87177ba6

Woojung.Huh@microchip.com authored Feb 25, 2016

Replace devid to chipid & chiprev for easy access.
Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

87177ba6

hv_netvsc: add ethtool support for set and get of settings · 49eb9389

sixiao@microsoft.com authored Feb 25, 2016

This patch allows the user to set and retrieve speed and duplex of the
hv_netvsc device via ethtool.

Example:
$ ethtool eth0
Settings for eth0:
...
    Speed: Unknown!
    Duplex: Unknown! (255)
...
$ ethtool -s eth0 speed 1000 duplex full
$ ethtool eth0
Settings for eth0:
...
    Speed: 1000Mb/s
    Duplex: Full
...

This is based on patches by Roopa Prabhu and Nikolay Aleksandrov.
Signed-off-by: Simon Xiao <sixiao@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

49eb9389

Merge branch 'sch-accurate-backlog' · c0affa19

David S. Miller authored Feb 29, 2016

Cong Wang says:

====================
net_sched: update backlog for hierarchical qdisc's

For hierarchical qdisc like HTB, we currently only update its qlen
but leave its backlog as zero:

    qdisc htb 1: dev eth0 root refcnt 2 r2q 10 default 1 direct_packets_stat 0 ver 3.17
     Sent 172680457356 bytes 222469449 pkt (dropped 0, overlimits 123575834 requeues 0)
     backlog 0b 72p requeues 0

This patchset makes backlog as accurate as qlen.

v3: rebase and fix the n==0 case for qdisc_tree_reduce_backlog()
v2: rebase and update changelog, not code change
====================
Signed-off-by: David S. Miller <davem@davemloft.net>

c0affa19

sch_dsmark: update backlog as well · bdf17661

WANG Cong authored Feb 25, 2016

Similarly, we need to update backlog too when we update qlen.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bdf17661

sch_htb: update backlog as well · 431e3a8e

WANG Cong authored Feb 25, 2016

We saw qlen!=0 but backlog==0 on our production machine:

qdisc htb 1: dev eth0 root refcnt 2 r2q 10 default 1 direct_packets_stat 0 ver 3.17
 Sent 172680457356 bytes 222469449 pkt (dropped 0, overlimits 123575834 requeues 0)
 backlog 0b 72p requeues 0

The problem is we only count qlen for HTB qdisc but not backlog.
We need to update backlog too when we update qlen, so that we
can at least know the average packet length.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

431e3a8e

net_sched: update hierarchical backlog too · 2ccccf5f

WANG Cong authored Feb 25, 2016

When the bottom qdisc decides to, for example, drop some packet,
it calls qdisc_tree_decrease_qlen() to update the queue length
for all its ancestors, we need to update the backlog too to
keep the stats on root qdisc accurate.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

2ccccf5f

net_sched: introduce qdisc_replace() helper · 86a7996c

WANG Cong authored Feb 25, 2016

Remove nearly duplicated code and prepare for the following patch.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

86a7996c

3c59x: Ensure to apply the expires time · f12d33f4

Stafford Horne authored Feb 28, 2016

In commit 5b6490de ("3c59x: Use setup_timer()") Amitoj
removed add_timer which sets up the epires timer.  In this patch
the behavior is restore but it uses mod_timer which is a bit more
compact.
Signed-off-by: Stafford Horne <shorne@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

f12d33f4