Commits · d98c802213542eabe282f025728e44b3c4bed6c5 · Kirill Smelkov / linux

06 Apr, 2016 40 commits

Drivers: hv: vmbus: remove code duplication in message handling · d98c8022

Vitaly Kuznetsov authored Feb 26, 2016

BugLink: http://bugs.launchpad.net/bugs/1541585

We have 3 functions dealing with messages and they all implement
the same logic to finalize reads, move it to vmbus_signal_eom().
Suggested-by: Radim Krcmar <rkrcmar@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Radim Kr.má<rkrcmar@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 0f70b669)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

d98c8022

Drivers: hv: vmbus: avoid wait_for_completion() on crash · 0e9f0780

Vitaly Kuznetsov authored Feb 26, 2016

BugLink: http://bugs.launchpad.net/bugs/1541585

wait_for_completion() may sleep, it enables interrupts and this
is something we really want to avoid on crashes because interrupt
handlers can cause other crashes. Switch to the recently introduced
vmbus_wait_for_unload() doing busy wait instead.
Reported-by: Radim Krcmar <rkrcmar@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Radim Kr.má<rkrcmar@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 75ff3a8a)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

0e9f0780

Drivers: hv: vmbus: don't loose HVMSG_TIMER_EXPIRED messages · be8e1814

Vitaly Kuznetsov authored Feb 26, 2016

BugLink: http://bugs.launchpad.net/bugs/1541585

We must handle HVMSG_TIMER_EXPIRED messages in the interrupt context
and we offload all the rest to vmbus_on_msg_dpc() tasklet. This functions
loops to see if there are new messages pending. In case we'll ever see
HVMSG_TIMER_EXPIRED message there we're going to lose it as we can't
handle it from there. Avoid looping in vmbus_on_msg_dpc(), we're OK
with handling one message per interrupt.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Radim Kr.má<rkrcmar@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 7be3e169)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

be8e1814

drivers/hv: Move VMBus hypercall codes into Hyper-V UAPI header · 88e58a0d

Andrey Smetanin authored Feb 11, 2016

BugLink: http://bugs.launchpad.net/bugs/1541585

VMBus hypercall codes inside Hyper-V UAPI header will
be used by QEMU to implement VMBus host devices support.
Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Joerg Roedel <joro@8bytes.org>
CC: "K. Y. Srinivasan" <kys@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
[Do not rename the constant at the same time as moving it, as that
 would cause semantic conflicts with the Hyper-V tree. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

(back ported from commit 18f09861)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

 Conflicts:
	arch/x86/include/uapi/asm/hyperv.h

88e58a0d

Drivers: hv: vmbus: Give control over how the ring access is serialized · 38cee56a

K. Y. Srinivasan authored Jan 27, 2016

BugLink: http://bugs.launchpad.net/bugs/1541585

On the channel send side, many of the VMBUS
device drivers explicity serialize access to the
outgoing ring buffer. Give more control to the
VMBUS device drivers in terms how to serialize
accesss to the outgoing ring buffer.
The default behavior will be to aquire the
ring lock to preserve the current behavior.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit fe760e4d)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

38cee56a

Drivers: hv: vmbus: Eliminate the spin lock on the read path · ebade2fc

K. Y. Srinivasan authored Jan 27, 2016

BugLink: http://bugs.launchpad.net/bugs/1541585

The function hv_ringbuffer_read() is called always on a pre-assigned
CPU. Each chnnel is bound to a specific CPU and this function is
always called on the CPU the channel is bound. There is no need to
acquire the spin lock; get rid of this overhead.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3eba9a77)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

ebade2fc

Drivers: hv: vmbus: add an API vmbus_hvsock_device_unregister() · 3173614d

Dexuan Cui authored Jan 27, 2016

BugLink: http://bugs.launchpad.net/bugs/1541585

The hvsock driver needs this API to release all the resources related
to the channel.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 85d9aa70)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

3173614d

Drivers: hv: vmbus: add a per-channel rescind callback · 041b9521

Dexuan Cui authored Jan 27, 2016

BugLink: http://bugs.launchpad.net/bugs/1541585

This will be used by the coming hv_sock driver.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 499e8401)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

041b9521

Drivers: hv: vmbus: add a hvsock flag in struct hv_driver · 36ca4f6f

Dexuan Cui authored Jan 27, 2016

BugLink: http://bugs.launchpad.net/bugs/1541585

Only the coming hv_sock driver has a "true" value for this flag.

We treat the hvsock offers/channels as special VMBus devices.
Since the hv_sock driver handles all the hvsock offers/channels, we need to
tweak vmbus_match() for hv_sock driver, so we introduce this flag.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 8981da32)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

36ca4f6f

Drivers: hv: vmbus: define a new VMBus message type for hvsock · 01c414d9

Dexuan Cui authored Jan 27, 2016

BugLink: http://bugs.launchpad.net/bugs/1541585

A function to send the type of message is also added.

The coming net/hvsock driver will use this function to proactively request
the host to offer a VMBus channel for a new hvsock connection.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 5c23a1a5)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

01c414d9

Drivers: hv: vmbus: vmbus_sendpacket_ctl: hvsock: avoid unnecessary signaling · ed50c38a

Dexuan Cui authored Jan 27, 2016

BugLink: http://bugs.launchpad.net/bugs/1541585

When the hvsock channel's outbound ringbuffer is full (i.e.,
hv_ringbuffer_write() returns -EAGAIN), we should avoid the unnecessary
signaling the host.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 5f363bc3)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

ed50c38a

Drivers: hv: vmbus: define the new offer type for Hyper-V socket (hvsock) · 5b726cdd

Dexuan Cui authored Jan 27, 2016

BugLink: http://bugs.launchpad.net/bugs/1541585

A helper function is also added.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit e8d6ca02)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

5b726cdd

Drivers: hv: vmbus: add a helper function to set a channel's pending send size · e719665b

Dexuan Cui authored Jan 27, 2016

BugLink: http://bugs.launchpad.net/bugs/1541585

This will be used by the coming net/hvsock driver.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3c75354d)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

e719665b

Drivers: hv: vmbus: don't manipulate with clocksources on crash · 10a56700

Vitaly Kuznetsov authored Jan 27, 2016

BugLink: http://bugs.launchpad.net/bugs/1541585

clocksource_change_rating() involves mutex usage and can't be called
in interrupt context. It also makes sense to avoid doing redundant work
on crash.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3ccb4fd8)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

10a56700

Drivers: hv: vmbus: avoid scheduling in interrupt context in vmbus_initiate_unload() · 744007a5

Vitaly Kuznetsov authored Jan 27, 2016

BugLink: http://bugs.launchpad.net/bugs/1541585

We have to call vmbus_initiate_unload() on crash to make kdump work but
the crash can also be happening in interrupt (e.g. Sysrq + c results in
such) where we can't schedule or the following will happen:

[  314.905786] bad: scheduling from the idle thread!

Just skipping the wait (and even adding some random wait here) won't help:
to make host-side magic working we're supposed to receive CHANNELMSG_UNLOAD
(and actually confirm the fact that we received it) but we can't use
interrupt-base path (vmbus_isr()-> vmbus_on_msg_dpc()). Implement a simple
busy wait ignoring all the other messages and use it if we're in an
interrupt context.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 41571916)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

744007a5

Drivers: hv: vmbus: avoid infinite loop in init_vp_index() · fd0e6a83

Vitaly Kuznetsov authored Jan 27, 2016

BugLink: http://bugs.launchpad.net/bugs/1541585

When we pick a CPU to use for a new subchannel we try find a non-used one
on the appropriate NUMA node, we keep track of them with the
primary->alloced_cpus_in_node mask. Under normal circumstances we don't run
out of available CPUs but it is possible when we we don't initialize some
cpus in Linux, e.g. when we boot with 'nr_cpus=' limitation.

Avoid the infinite loop in init_vp_index() by checking that we still have
non-used CPUs in the alloced_cpus_in_node mask and resetting it in case
we don't.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 79fd8e70)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

fd0e6a83

Drivers: hv: vmbus: Add vendor and device atttributes · 68291980

K. Y. Srinivasan authored Dec 25, 2015

BugLink: http://bugs.launchpad.net/bugs/1541585

Add vendor and device attributes to VMBUS devices. These will be used
by Hyper-V tools as well user-level RDMA libraries that will use the
vendor/device tuple to discover the RDMA device.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 7047f17d)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

68291980

Drivers: hv: vmbus: Cleanup vmbus_set_event() · 1454f3a3

K. Y. Srinivasan authored Dec 21, 2015

BugLink: http://bugs.launchpad.net/bugs/1541585

Cleanup vmbus_set_event() by inlining the hypercall to post
the event and since the return value of vmbus_set_event() is not checked,
make it void. As part of this cleanup, get rid of the function
hv_signal_event() as it is only callled from vmbus_set_event().
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 1b807e10)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

1454f3a3

intel_idle: prevent SKL-H boot failure when C8+C9+C10 enabled · 8f3b0067

Len Brown authored Mar 13, 2016

BugLink: http://bugs.launchpad.net/bugs/1559918

Some SKL-H configurations require "intel_idle.max_cstate=7" to boot.
While that is an effective workaround, it disables C10.

This patch detects the problematic configuration,
and disables C8 and C9, keeping C10 enabled.

Note that enabling SGX in BIOS SETUP can also prevent this issue,
if the system BIOS provides that option.

https://bugzilla.kernel.org/show_bug.cgi?id=109081
"Freezes with Intel i7 6700HQ (Skylake), unless intel_idle.max_cstate=7"
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: stable@vger.kernel.org
(cherry picked from commit d70e28f5)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

8f3b0067

net: ixgbe: abort with cls u32 divisor groups greater than 1 · 5809416a

John Fastabend authored Feb 16, 2016

BugLink: http://bugs.launchpad.net/bugs/1562326

This patch ensures ixgbe will not try to offload hash tables from the
u32 module. The device class does not currently support this so until
it is enabled just abort on these tables.

Interestingly the more flexible your hardware is the less code you
need to implement to guard against these cases.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit db956ae8)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

5809416a

net: ixgbe: add support for tc_u32 offload · 1cb785c4

John Fastabend authored Feb 16, 2016

BugLink: http://bugs.launchpad.net/bugs/1562326

This adds initial support for offloading the u32 tc classifier. This
initial implementation only implements a few base matches and actions
to illustrate the use of the infrastructure patches.

However it is an interesting subset because it handles the u32 next
hdr logic to correctly map tcp packets from ip headers using the ihl
and protocol fields. After this is accepted we can extend the match
and action fields easily by updating the model header file.

Also only the drop action is supported initially.

Here is a short test script,

 #tc qdisc add dev eth4 ingress
 #tc filter add dev eth4 parent ffff: protocol ip \
	u32 ht 800: order 1 \
	match ip dst 15.0.0.1/32 match ip src 15.0.0.2/32 action drop

<-- hardware has dst/src ip match rule installed -->

 #tc filter del dev eth4 parent ffff: prio 49152
 #tc filter add dev eth4 parent ffff: protocol ip prio 99 \
	handle 1: u32 divisor 1
 #tc filter add dev eth4 protocol ip parent ffff: prio 99 \
	u32 ht 800: order 1 link 1: \
	offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff
 #tc filter add dev eth4 parent ffff: protocol ip \
	u32 ht 1: order 3 match tcp src 23 ffff action drop

<-- hardware has tcp src port rule installed -->

 #tc qdisc del dev eth4 parent ffff:

<-- hardware cleaned up -->
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit b82b17d9)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

1cb785c4

net: sched: add cls_u32 offload hooks for netdevs · 2f94659a

John Fastabend authored Feb 16, 2016

BugLink: http://bugs.launchpad.net/bugs/1562326

This patch allows netdev drivers to consume cls_u32 offloads via
the ndo_setup_tc ndo op.

This works aligns with how network drivers have been doing qdisc
offloads for mqprio.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a1b7c5fd)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

2f94659a

net: rework setup_tc ndo op to consume general tc operand · cc16f69f

John Fastabend authored Feb 16, 2016

BugLink: http://bugs.launchpad.net/bugs/1562326

This patch updates setup_tc so we can pass additional parameters into
the ndo op in a generic way. To do this we provide structured union
and type flag.

This lets each classifier and qdisc provide its own set of attributes
without having to add new ndo ops or grow the signature of the
callback.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(back ported from commit 16e5cc64)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

 Conflicts:
	drivers/net/ethernet/mellanox/mlx4/en_netdev.c

cc16f69f

net: rework ndo tc op to consume additional qdisc handle parameter · 4e659d74

John Fastabend authored Feb 16, 2016

BugLink: http://bugs.launchpad.net/bugs/1562326

The ndo_setup_tc() op was added to support drivers offloading tx
qdiscs however only support for mqprio was ever added. So we
only ever added support for passing the number of traffic classes
to the driver.

This patch generalizes the ndo_setup_tc op so that a handle can
be provided to indicate if the offload is for ingress or egress
or potentially even child qdiscs.

CC: Murali Karicheri <m-karicheri2@ti.com>
CC: Shradha Shah <sshah@solarflare.com>
CC: Or Gerlitz <ogerlitz@mellanox.com>
CC: Ariel Elior <ariel.elior@qlogic.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Bruce Allan <bruce.w.allan@intel.com>
CC: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(back ported from commit e4c6734e)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

 Conflicts:
	drivers/net/ethernet/mellanox/mlx4/en_netdev.c

4e659d74

sctp: Rename NETIF_F_SCTP_CSUM to NETIF_F_SCTP_CRC · 3425b5da

Tom Herbert authored Dec 14, 2015

BugLink: http://bugs.launchpad.net/bugs/1562326

The SCTP checksum is really a CRC and is very different from the
standards 1's complement checksum that serves as the checksum
for IP protocols. This offload interface is also very different.
Rename NETIF_F_SCTP_CSUM to NETIF_F_SCTP_CRC to highlight these
differences. The term CSUM should be reserved in the stack to refer
to the standard 1's complement IP checksum.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 53692b1d)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

3425b5da

net: tc: helper functions to query action types · c6a2d32f

John Fastabend authored Feb 16, 2016

BugLink: http://bugs.launchpad.net/bugs/1562326

This is a helper function drivers can use to learn if the
action type is a drop action.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3b01cf56)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

c6a2d32f

net: add tc offload feature flag · efe48c0f

John Fastabend authored Feb 16, 2016

BugLink: http://bugs.launchpad.net/bugs/1562326

Its useful to turn off the qdisc offload feature at a per device
level. This gives us a big hammer to enable/disable offloading.
More fine grained control (i.e. per rule) may be supported later.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1c78c64e)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

efe48c0f

fm10k: don't reinitialize RSS flow table when RXFH configured · 12f7db74

Keller, Jacob E authored Feb 08, 2016

BugLink: http://bugs.launchpad.net/bugs/1562310

Also print an error message incase we do have to reconfigure as this
should no longer happen anymore due to ethtool changes. If it somehow
does occur, user should be made aware of it.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1012014e)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

12f7db74

ethtool: correctly ensure {GS}CHANNELS doesn't conflict with GS{RXFH} · c023158d

Keller, Jacob E authored Feb 08, 2016

BugLink: http://bugs.launchpad.net/bugs/1562310

Ethernet drivers implementing both {GS}RXFH and {GS}CHANNELS ethtool ops
incorrectly allow SCHANNELS when it would conflict with the settings
from SRXFH. This occurs because it is not possible for drivers to
understand whether their Rx flow indirection table has been configured
or is in the default state. In addition, drivers currently behave in
various ways when increasing the number of Rx channels.

Some drivers will always destroy the Rx flow indirection table when this
occurs, whether it has been set by the user or not. Other drivers will
attempt to preserve the table even if the user has never modified it
from the default driver settings. Neither of these situation is
desirable because it leads to unexpected behavior or loss of user
configuration.

The correct behavior is to simply return -EINVAL when SCHANNELS would
conflict with the current Rx flow table settings. However, it should
only do so if the current settings were modified by the user. If we
required that the new settings never conflict with the current (default)
Rx flow settings, we would force users to first reduce their Rx flow
settings and then reduce the number of Rx channels.

This patch proposes a solution implemented in net/core/ethtool.c which
ensures that all drivers behave correctly. It checks whether the RXFH
table has been configured to non-default settings, and stores this
information in a private netdev flag. When the number of channels is
requested to change, it first ensures that the current Rx flow table is
not going to assign flows to now disabled channels.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d4ab4286)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

c023158d

net: add netif_is_lag_port helper · ad4c7498

Jiri Pirko authored Dec 03, 2015

BugLink: http://bugs.launchpad.net/bugs/1562310

Some code does not mind if a device is bond slave or team port and treats
them the same, as generic LAG ports.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e0ba1414)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

ad4c7498

net: add netif_is_lag_master helper · 394833da

Jiri Pirko authored Dec 03, 2015

BugLink: http://bugs.launchpad.net/bugs/1562310

Some code does not mind if the master is bond or team and treats them
the same, as generic LAG.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7be61833)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

394833da

net: add netif_is_team_port helper · 84783568

Jiri Pirko authored Dec 03, 2015

BugLink: http://bugs.launchpad.net/bugs/1562310

Similar to other helpers, caller can use this to find out if device is
team port.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f7f019ee)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

84783568

net: add netif_is_team_master helper · 85c5f09b

Jiri Pirko authored Dec 03, 2015

BugLink: http://bugs.launchpad.net/bugs/1562310

Similar to other helpers, caller can use this to find out if device is
team master.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c981e421)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

85c5f09b

e1000e: Adds hardware supported cross timestamp on e1000e nic · 84f852cd

Christopher S. Hall authored Feb 22, 2016

BugLink: http://bugs.launchpad.net/bugs/1519625

Modern Intel systems supports cross timestamping of the network device
clock and Always Running Timer (ART) in hardware.  This allows the
device time and system time to be precisely correlated. The timestamp
pair is returned through e1000e_phc_get_syncdevicetime() used by
get_system_device_crosststamp().  The hardware cross-timestamp result
is made available to applications through the PTP_SYS_OFFSET_PRECISE
ioctl which calls e1000e_phc_getcrosststamp().

Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: kevin.b.stanton@intel.com
Cc: kevin.j.clarke@intel.com
Cc: hpa@zytor.com
Cc: jeffrey.t.kirsher@intel.com
Cc: netdev@vger.kernel.org
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Christopher S. Hall <christopher.s.hall@intel.com>
[jstultz: Reworked to use new interface, commit message tweaks]
Signed-off-by: John Stultz <john.stultz@linaro.org>

(cherry picked from commit 01d7ada5)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

84f852cd

UBUNTU: [Config] CONFIG_E1000E_HWTS=y · bce2e6d9

Tim Gardner authored Mar 26, 2016

BugLink: http://bugs.launchpad.net/bugs/1519625Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

bce2e6d9

ptp: Add PTP_SYS_OFFSET_PRECISE for driver crosstimestamping · 71d856e1

Christopher S. Hall authored Feb 22, 2016

BugLink: http://bugs.launchpad.net/bugs/1519625

Currently, network /system cross-timestamping is performed in the
PTP_SYS_OFFSET ioctl. The PTP clock driver reads gettimeofday() and
the gettime64() callback provided by the driver. The cross-timestamp
is best effort where the latency between the capture of system time
(getnstimeofday()) and the device time (driver callback) may be
significant.

The getcrosststamp() callback and corresponding PTP_SYS_OFFSET_PRECISE
ioctl allows the driver to perform this device/system correlation when
for example cross timestamp hardware is available. Modern Intel
systems can do this for onboard Ethernet controllers using the ART
counter. There is virtually zero latency between captures of the ART
and network device clock.

The capabilities ioctl (PTP_CLOCK_GETCAPS), is augmented allowing
applications to query whether or not drivers implement the
getcrosststamp callback, providing more precise cross timestamping.

Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: kevin.b.stanton@intel.com
Cc: kevin.j.clarke@intel.com
Cc: hpa@zytor.com
Cc: jeffrey.t.kirsher@intel.com
Cc: netdev@vger.kernel.org
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Christopher S. Hall <christopher.s.hall@intel.com>
[jstultz: Commit subject tweaks]
Signed-off-by: John Stultz <john.stultz@linaro.org>

(cherry picked from commit 719f1aa4)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

71d856e1

x86/tsc: Always Running Timer (ART) correlated clocksource · d30e3cde

Christopher S. Hall authored Feb 29, 2016

BugLink: http://bugs.launchpad.net/bugs/1519625

On modern Intel systems TSC is derived from the new Always Running Timer
(ART). ART can be captured simultaneous to the capture of
audio and network device clocks, allowing a correlation between timebases
to be constructed. Upon capture, the driver converts the captured ART
value to the appropriate system clock using the correlated clocksource
mechanism.

On systems that support ART a new CPUID leaf (0x15) returns parameters
“m” and “n” such that:

TSC_value = (ART_value * m) / n + k [n >= 1]

[k is an offset that can adjusted by a privileged agent. The
IA32_TSC_ADJUST MSR is an example of an interface to adjust k.
See 17.14.4 of the Intel SDM for more details]

Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: kevin.b.stanton@intel.com
Cc: kevin.j.clarke@intel.com
Cc: hpa@zytor.com
Cc: jeffrey.t.kirsher@intel.com
Cc: netdev@vger.kernel.org
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Christopher S. Hall <christopher.s.hall@intel.com>
[jstultz: Tweaked to fix build issue, also reworked math for
64bit division on 32bit systems, as well as !CONFIG_CPU_FREQ build
fixes]
Signed-off-by: John Stultz <john.stultz@linaro.org>

(cherry picked from commit f9677e0f)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

d30e3cde

time/timekeeping: Work around false positive GCC warning · 8d96c55c

Ingo Molnar authored Mar 08, 2016

BugLink: http://bugs.launchpad.net/bugs/1519625

Newer GCC versions trigger the following warning:

  kernel/time/timekeeping.c: In function ‘get_device_system_crosststamp’:
  kernel/time/timekeeping.c:987:5: warning: ‘clock_was_set_seq’ may be used uninitialized in this function [-Wmaybe-uninitialized]
    if (discontinuity) {
     ^
  kernel/time/timekeeping.c:1045:15: note: ‘clock_was_set_seq’ was declared here
    unsigned int clock_was_set_seq;
                 ^

GCC clearly is unable to recognize that the 'do_interp' boolean tracks
the initialization status of 'clock_was_set_seq'.

The GCC version used was:

  gcc version 5.3.1 20151207 (Red Hat 5.3.1-2) (GCC)

Work it around by initializing clock_was_set_seq to 0. Compilers that
are able to recognize the code flow will eliminate the unnecessary
initialization.
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 6436257b)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

8d96c55c

time: Add history to cross timestamp interface supporting slower devices · 63cf335f

Christopher S. Hall authored Feb 22, 2016

BugLink: http://bugs.launchpad.net/bugs/1519625

Another representative use case of time sync and the correlated
clocksource (in addition to PTP noted above) is PTP synchronized
audio.

In a streaming application, as an example, samples will be sent and/or
received by multiple devices with a presentation time that is in terms
of the PTP master clock. Synchronizing the audio output on these
devices requires correlating the audio clock with the PTP master
clock. The more precise this correlation is, the better the audio
quality (i.e. out of sync audio sounds bad).

From an application standpoint, to correlate the PTP master clock with
the audio device clock, the system clock is used as a intermediate
timebase. The transforms such an application would perform are:

    System Clock <-> Audio clock
    System Clock <-> Network Device Clock [<-> PTP Master Clock]

Modern Intel platforms can perform a more accurate cross timestamp in
hardware (ART,audio device clock).  The audio driver requires
ART->system time transforms -- the same as required for the network
driver. These platforms offload audio processing (including
cross-timestamps) to a DSP which to ensure uninterrupted audio
processing, communicates and response to the host only once every
millsecond. As a result is takes up to a millisecond for the DSP to
receive a request, the request is processed by the DSP, the audio
output hardware is polled for completion, the result is copied into
shared memory, and the host is notified. All of these operation occur
on a millisecond cadence.  This transaction requires about 2 ms, but
under heavier workloads it may take up to 4 ms.

Adding a history allows these slow devices the option of providing an
ART value outside of the current interval. In this case, the callback
provided is an accessor function for the previously obtained counter
value. If get_system_device_crosststamp() receives a counter value
previous to cycle_last, it consults the history provided as an
argument in history_ref and interpolates the realtime and monotonic
raw system time using the provided counter value. If there are any
clock discontinuities, e.g. from calling settimeofday(), the monotonic
raw time is interpolated in the usual way, but the realtime clock time
is adjusted by scaling the monotonic raw adjustment.

When an accessor function is used a history argument *must* be
provided. The history is initialized using ktime_get_snapshot() and
must be called before the counter values are read.

Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: kevin.b.stanton@intel.com
Cc: kevin.j.clarke@intel.com
Cc: hpa@zytor.com
Cc: jeffrey.t.kirsher@intel.com
Cc: netdev@vger.kernel.org
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Christopher S. Hall <christopher.s.hall@intel.com>
[jstultz: Fixed up cycles_t/cycle_t type confusion]
Signed-off-by: John Stultz <john.stultz@linaro.org>

(cherry picked from commit 2c756feb)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

63cf335f

time: Add driver cross timestamp interface for higher precision time synchronization · 8a756d81

Christopher S. Hall authored Feb 22, 2016

BugLink: http://bugs.launchpad.net/bugs/1519625

ACKNOWLEDGMENT: cross timestamp code was developed by Thomas Gleixner
<tglx@linutronix.de>. It has changed considerably and any mistakes are
mine.

The precision with which events on multiple networked systems can be
synchronized using, as an example, PTP (IEEE 1588, 802.1AS) is limited
by the precision of the cross timestamps between the system clock and
the device (timestamp) clock. Precision here is the degree of
simultaneity when capturing the cross timestamp.

Currently the PTP cross timestamp is captured in software using the
PTP device driver ioctl PTP_SYS_OFFSET. Reads of the device clock are
interleaved with reads of the realtime clock. At best, the precision
of this cross timestamp is on the order of several microseconds due to
software latencies. Sub-microsecond precision is required for
industrial control and some media applications. To achieve this level
of precision hardware supported cross timestamping is needed.

The function get_device_system_crosstimestamp() allows device drivers
to return a cross timestamp with system time properly scaled to
nanoseconds.  The realtime value is needed to discipline that clock
using PTP and the monotonic raw value is used for applications that
don't require a "real" time, but need an unadjusted clock time.  The
get_device_system_crosstimestamp() code calls back into the driver to
ensure that the system counter is within the current timekeeping
update interval.

Modern Intel hardware provides an Always Running Timer (ART) which is
exactly related to TSC through a known frequency ratio. The ART is
routed to devices on the system and is used to precisely and
simultaneously capture the device clock with the ART.

Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: kevin.b.stanton@intel.com
Cc: kevin.j.clarke@intel.com
Cc: hpa@zytor.com
Cc: jeffrey.t.kirsher@intel.com
Cc: netdev@vger.kernel.org
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Christopher S. Hall <christopher.s.hall@intel.com>
[jstultz: Reworked to remove extra structures and simplify calling]
Signed-off-by: John Stultz <john.stultz@linaro.org>

(cherry picked from commit 8006c245)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

8a756d81