Commits · d2b59eefb500ce0facde7711c9ad7e2201162daa · Kirill Smelkov / linux

19 Sep, 2016 5 commits

UBUNTU: SAUCE: nvme: Don't suspend admin queue that wasn't created · d2b59eef

Gabriel Krisman Bertazi authored Sep 13, 2016

Pending 4.8-rc merge.
BugLink: http://bugs.launchpad.net/bugs/1602724

This fixes a regression in my previous commit c21377f8 ("nvme:
Suspend all queues before deletion"), which provoked an Oops in the
removal path when removing a device that became IO incapable very early
at probe (i.e. after a failed EEH recovery).

Turns out, if the error occurred very early at the probe path, before
even configuring the admin queue, we might try to suspend the
uninitialized admin queue, accessing bad memory.

Fixes: c21377f8 ("nvme: Suspend all queues before deletion")
Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

d2b59eef

blk-mq: don't overwrite rq->mq_ctx · f463371e

Jens Axboe authored Sep 06, 2016

BugLink: http://bugs.launchpad.net/bugs/1620317

We do this in a few places, if the CPU is offline. This isn't allowed,
though, since on multi queue hardware, we can't just move a request
from one software queue to another, if they map to different hardware
queues. The request and tag isn't valid on another hardware queue.

This can happen if plugging races with CPU offlining. But it does
no harm, since it can only happen in the window where we are
currently busy freezing the queue and flushing IO, in preparation
for redoing the software <-> hardware queue mappings.
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit e57690fe)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

f463371e

blk-mq: improve warning for running a queue on the wrong CPU · 8b6f6135

Jens Axboe authored Sep 06, 2016

BugLink: http://bugs.launchpad.net/bugs/1620317

__blk_mq_run_hw_queue() currently warns if we are running the queue on a
CPU that isn't set in its mask. However, this can happen if a CPU is
being offlined, and the workqueue handling will place the work on CPU0
instead. Improve the warning so that it only triggers if the batch cpu
in the hardware queue is currently online.  If it triggers for that
case, then it's indicative of a flow problem in blk-mq, so we want to
retain it for that case.
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 0e87e58b)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

8b6f6135

blk-mq: Allow timeouts to run while queue is freezing · 702b11d1

Gabriel Krisman Bertazi authored Sep 06, 2016

BugLink: http://bugs.launchpad.net/bugs/1620317

In case a submitted request gets stuck for some reason, the block layer
can prevent the request starvation by starting the scheduled timeout work.
If this stuck request occurs at the same time another thread has started
a queue freeze, the blk_mq_timeout_work will not be able to acquire the
queue reference and will return silently, thus not issuing the timeout.
But since the request is already holding a q_usage_counter reference and
is unable to complete, it will never release its reference, preventing
the queue from completing the freeze started by first thread.  This puts
the request_queue in a hung state, forever waiting for the freeze
completion.

This was observed while running IO to a NVMe device at the same time we
toggled the CPU hotplug code. Eventually, once a request got stuck
requiring a timeout during a queue freeze, we saw the CPU Hotplug
notification code get stuck inside blk_mq_freeze_queue_wait, as shown in
the trace below.

[c000000deaf13690] [c000000deaf13738] 0xc000000deaf13738 (unreliable)
[c000000deaf13860] [c000000000015ce8] __switch_to+0x1f8/0x350
[c000000deaf138b0] [c000000000ade0e4] __schedule+0x314/0x990
[c000000deaf13940] [c000000000ade7a8] schedule+0x48/0xc0
[c000000deaf13970] [c0000000005492a4] blk_mq_freeze_queue_wait+0x74/0x110
[c000000deaf139e0] [c00000000054b6a8] blk_mq_queue_reinit_notify+0x1a8/0x2e0
[c000000deaf13a40] [c0000000000e7878] notifier_call_chain+0x98/0x100
[c000000deaf13a90] [c0000000000b8e08] cpu_notify_nofail+0x48/0xa0
[c000000deaf13ac0] [c0000000000b92f0] _cpu_down+0x2a0/0x400
[c000000deaf13b90] [c0000000000b94a8] cpu_down+0x58/0xa0
[c000000deaf13bc0] [c0000000006d5dcc] cpu_subsys_offline+0x2c/0x50
[c000000deaf13bf0] [c0000000006cd244] device_offline+0x104/0x140
[c000000deaf13c30] [c0000000006cd40c] online_store+0x6c/0xc0
[c000000deaf13c80] [c0000000006c8c78] dev_attr_store+0x68/0xa0
[c000000deaf13cc0] [c0000000003974d0] sysfs_kf_write+0x80/0xb0
[c000000deaf13d00] [c0000000003963e8] kernfs_fop_write+0x188/0x200
[c000000deaf13d50] [c0000000002e0f6c] __vfs_write+0x6c/0xe0
[c000000deaf13d90] [c0000000002e1ca0] vfs_write+0xc0/0x230
[c000000deaf13de0] [c0000000002e2cdc] SyS_write+0x6c/0x110
[c000000deaf13e30] [c000000000009204] system_call+0x38/0xb4

The fix is to allow the timeout work to execute in the window between
dropping the initial refcount reference and the release of the last
reference, which actually marks the freeze completion.  This can be
achieved with percpu_refcount_tryget, which does not require the counter
to be alive.  This way the timeout work can do it's job and terminate a
stuck request even during a freeze, returning its reference and avoiding
the deadlock.

Allowing the timeout to run is just a part of the fix, since for some
devices, we might get stuck again inside the device driver's timeout
handler, should it attempt to allocate a new request in that path -
which is a quite common action for Abort commands, which need to be sent
after a timeout.  In NVMe, for instance, we call blk_mq_alloc_request
from inside the timeout handler, which will fail during a freeze, since
it also tries to acquire a queue reference.

I considered a similar change to blk_mq_alloc_request as a generic
solution for further device driver hangs, but we can't do that, since it
would allow new requests to disturb the freeze process.  I thought about
creating a new function in the block layer to support unfreezable
requests for these occasions, but after working on it for a while, I
feel like this should be handled in a per-driver basis.  I'm now
experimenting with changes to the NVMe timeout path, but I'm open to
suggestions of ways to make this generic.
Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Cc: Brian King <brking@linux.vnet.ibm.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: linux-nvme@lists.infradead.org
Cc: linux-block@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 71f79fb3)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

702b11d1

btrfs: bugfix: handle FS_IOC32_{GETFLAGS, SETFLAGS, GETVERSION} in btrfs_ioctl · 50603667

Luke Dashjr authored Sep 06, 2016

BugLink: http://bugs.launchpad.net/bugs/1619918

32-bit ioctl uses these rather than the regular FS_IOC_* versions. They can
be handled in btrfs using the same code. Without this, 32-bit {ch,ls}attr
fail.
Signed-off-by: Luke Dashjr <luke-jr+git@utopios.org>
Cc: stable@vger.kernel.org
Reviewed-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
(back ported from commit 4c63c245)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

 Conflicts:
	fs/btrfs/ctree.h
Acked-by: Brad Figg <brad.figg@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

50603667

16 Sep, 2016 35 commits

drm/radeon/dp: add back special handling for NUTMEG · 9a9ef3ed

Alex Deucher authored Sep 12, 2016

BugLink: http://bugs.launchpad.net/bugs/1600092

When I fixed the dp rate selection in:
092c96a8
drm/radeon: fix dp link rate selection (v2)
I accidently dropped the special handling for NUTMEG
DP bridge chips.  They require a fixed link rate.
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Ken Wang <Qingqing.Wang@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Tested-by: Ken Moffat <zarniwhoop@ntlworld.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
(cherry picked from commit c8213a63)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

9a9ef3ed

qed: add MODULE_FIRMWARE() · 09a079ef

Yuval Mintz authored Feb 24, 2016

BugLink: http://bugs.launchpad.net/bugs/1623187

Module is using a binary firmware file and so should be marked as such.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d43d3f0f)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

09a079ef

ixgbevf: Use mac_ops instead of trying to identify NIC type · 711bd856

Alexander Duyck authored Apr 22, 2016

BugLink: http://bugs.launchpad.net/bugs/1616677

This change makes it so that we can just use function pointers instead of
having to identify if a given VF is running on a Linux or Windows PF.  By
doing this we can avoid having to pull too much information out of the
lower layers and can instead just make use of the mac_ops pointers since
they should differ between the two types of VFs anyway.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 2f8214fe)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

711bd856

ixgbevf: Change the relaxed order settings in VF driver for sparc · c6aa6fb1

Babu Moger authored Apr 21, 2016

BugLink: http://bugs.launchpad.net/bugs/1616677

We noticed performance issues with VF interface on sparc compared
to PF. Setting the RX to IXGBE_DCA_RXCTRL_DATA_WRO_EN brings it
on far with PF. Also this matches to the default sparc setting in
PF driver.
Signed-off-by: Babu Moger <babu.moger@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 33b0eb15)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

c6aa6fb1

ixgbevf: Support Windows hosts (Hyper-V) · db6edd31

KY Srinivasan authored Apr 19, 2016

BugLink: http://bugs.launchpad.net/bugs/1616677

On Hyper-V, the VF/PF communication is a via software mediated path
as opposed to the hardware mailbox. Make the necessary
adjustments to support Hyper-V.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit c6d45171)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

db6edd31

ixgbevf: Add the device ID's presented while running on Hyper-V · 377b7d21

KY Srinivasan authored Apr 19, 2016

BugLink: http://bugs.launchpad.net/bugs/1616677

Intel SR-IOV cards present different ID when running on Hyper-V.
Add the device IDs presented while running on Hyper-V.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit b4363fbd)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

377b7d21

ixgbevf: Move API negotiation function into mac_ops · 97ca47c9

Alexander Duyck authored Apr 14, 2016

BugLink: http://bugs.launchpad.net/bugs/1616677

This patch moves API negotiation into mac_ops.  The general idea here is
that with HyperV on the way we need to make certain that anything that will
have different versions between HyperV and a standard VF needs to be
abstracted enough so that we can have a separate function between the two
so we can avoid changes in one breaking something in the other.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 7921f4dc)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

97ca47c9

ixgbevf: make use of BIT() macro to avoid shift of signed values · c5ff746f

Jacob Keller authored Apr 13, 2016

BugLink: http://bugs.launchpad.net/bugs/1616677

Also cleanup a case where we're bit shifting a value into place, and use
an unsigned constant. Make use of the unsigned postfix in places where
BIT() macro is not appropriate.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 8d055cc0)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

c5ff746f

ixgbevf: add support for per-queue ethtool stats · 06a55c6d

Emil Tantilov authored Apr 07, 2016

BugLink: http://bugs.launchpad.net/bugs/1616677

Implement per-queue statistics for packets, bytes and busy poll
specific counters.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit a02a5a53)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

06a55c6d

ixgbevf: refactor ethtool stats handling · 8df3034f

Emil Tantilov authored Apr 07, 2016

BugLink: http://bugs.launchpad.net/bugs/1616677

This brings the logic closer to how we handle the stats in ixgbe and it
sets us up for introducing per-queue stats.

Use IXGBEVF_STAT and IXGBEVF_NETDEV_STAT for accessing the driver and
netdev stats respectively. This way we don't have to calculate the
stats based on register values which could lead to the counters not
being initialized properly when the interface is down.

IXGBEVF_QUEUE_STATS_LEN is set to include the number of queues.

Also some defines were renamed to use the IXGBEVF prefix.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit d72d6c19)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

8df3034f

ixgbe/ixgbevf: Add support for bulk free in Tx cleanup & cleanup boolean logic · 97feb7bc

Alexander Duyck authored Mar 07, 2016

BugLink: http://bugs.launchpad.net/bugs/1616677

This patch enables bulk free in Tx cleanup for ixgbevf and cleans up the
boolean logic in the polling routines for ixgbe and ixgbevf in the hopes of
avoiding any mix-ups similar to what occurred with i40e and i40evf.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 8220bbc1)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

97feb7bc

ixgbevf: Add support for generic Tx checksums · 308fa9bd

Alexander Duyck authored Jan 13, 2016

BugLink: http://bugs.launchpad.net/bugs/1616677

This patch adds support for generic Tx checksums to the ixgbevf driver.  It
turns out this is actually pretty easy after going over the datasheet as we
were doing a number of steps we didn't need to.

In order to perform a Tx checksum for an L4 header we need to fill in the
following fields in the Tx descriptor:
  MACLEN (maximum of 127), retrieved from:
		skb_network_offset()
  IPLEN  (maximum of 511), retrieved from:
		skb_checksum_start_offset() - skb_network_offset()
  TUCMD.L4T indicates offset and if checksum or crc32c, based on:
		skb->csum_offset

The added advantage to doing this is that we can support inner checksum
offloads for tunnels and MPLS while still being able to transparently
insert VLAN tags.

I also took the opportunity to clean-up many of the feature flag
configuration bits to make them a bit more consistent between drivers.  In
the case of the VF drivers this meant adding support for SCTP CRCs, and
inner checksum offloads for MPLS and various tunnel types.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit cb2b3edb)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

308fa9bd

ixgbevf: use bit operations for setting and checking resets · c7463e6e

Emil Tantilov authored Dec 17, 2015

BugLink: http://bugs.launchpad.net/bugs/1616677

Move the reset flags to adapter->state in order to make use of bit
operations.

This is an alternative patch to the one previously submitted by
John Greene.
Suggested-by: Alexander Duyck <aduyck@mirantis.com>
Reported-by: Scott Otto <otts62@yahoo.com>
Reported-by: John Greene <jogreene@redhat.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit d5dd7c3f)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

c7463e6e

ixgbevf: fix error code path when setting MAC address · 5c9b8cea

Emil Tantilov authored Feb 10, 2016

BugLink: http://bugs.launchpad.net/bugs/1616677

Return error when a MAC address change is rejected by the PF.

This will prevent the user from modifying the MAC address when
that operation is not permitted.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 32ca6868)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

5c9b8cea

ixgbevf: call ndo_stop() instead of dev_close() when running offline selftest · 76eb020f

Stefan Assmann authored Feb 03, 2016

BugLink: http://bugs.launchpad.net/bugs/1616677

Calling dev_close() causes IFF_UP to be cleared which will remove the
interfaces routes and some addresses. That's probably not what the user
intended when running the offline selftest. Besides this does not happen
if the interface is brought down before the test, so the current
behaviour is inconsistent.
Instead call the net_device_ops ndo_stop function directly and avoid
touching IFF_UP at all.
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 324d0867)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

76eb020f

Drivers: hv: vmbus: Use the new virt_xx barrier code · 1cd8979b

K. Y. Srinivasan authored Apr 02, 2016

BugLink: http://bugs.launchpad.net/bugs/1616677

Use the virt_xx barriers that have been defined for use in virtual machines.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(back ported from commit dcd0eeca)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

 Conflicts:
	drivers/hv/ring_buffer.c
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

1cd8979b

hv_netvsc: fix bonding devices check in netvsc_netdev_event() · 40d0b514

Vitaly Kuznetsov authored Aug 15, 2016

BugLink: http://bugs.launchpad.net/bugs/1616677

Bonding driver sets IFF_BONDING on both master (the bonding device) and
slave (the real NIC) devices and in netvsc_netdev_event() we want to skip
master devices only. Currently, there is an uncertainty when a slave
interface is removed: if bonding module comes first in netdev_chain it
clears IFF_BONDING flag on the netdev and netvsc_netdev_event() correctly
handles NETDEV_UNREGISTER event, but in case netvsc comes first on the
chain it sees the device with IFF_BONDING still attached and skips it. As
we still hold vf_netdev pointer to the device we crash on the next inject.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 0dbff144)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

40d0b514

hv_netvsc: protect module refcount by checking net_device_ctx->vf_netdev · e2f20fb6

Vitaly Kuznetsov authored Aug 15, 2016

BugLink: http://bugs.launchpad.net/bugs/1616677

We're not guaranteed to see NETDEV_REGISTER/NETDEV_UNREGISTER notifications
only once per VF but we increase/decrease module refcount unconditionally.
Check vf_netdev to make sure we don't take/release it twice. We presume
that only one VF per netvsc device may exist.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 0f20d795)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

e2f20fb6

hv_netvsc: reset vf_inject on VF removal · 54c9fccb

Vitaly Kuznetsov authored Aug 15, 2016

BugLink: http://bugs.launchpad.net/bugs/1616677

We reset vf_inject on VF going down (netvsc_vf_down()) but we don't on
VF removal (netvsc_unregister_vf()) so vf_inject stays 'true' while
vf_netdev is already NULL and we're trying to inject packets into NULL
net device in netvsc_recv_callback() causing kernel to crash.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 57c1826b)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

54c9fccb

hv_netvsc: avoid deadlocks between rtnl lock and vf_use_cnt wait · 5424f383

Vitaly Kuznetsov authored Aug 15, 2016

BugLink: http://bugs.launchpad.net/bugs/1616677

Here is a deadlock scenario:
- netvsc_vf_up() schedules netvsc_notify_peers() work and quits.
- netvsc_vf_down() runs before netvsc_notify_peers() gets executed. As it
  is being executed from netdev notifier chain we hold rtnl lock when we
  get here.
- we enter while (atomic_read(&net_device_ctx->vf_use_cnt) != 0) loop and
  wait till netvsc_notify_peers() drops vf_use_cnt.
- netvsc_notify_peers() starts on some other CPU but netdev_notify_peers()
  will hang on rtnl_lock().
- deadlock!

Instead of introducing additional synchronization I suggest we drop
gwrk.dwrk completely and call NETDEV_NOTIFY_PEERS directly. As we're
acting under rtnl lock this is legitimate.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d072218f)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

5424f383

hv_netvsc: don't lose VF information · e2d7d2a8

Vitaly Kuznetsov authored Aug 15, 2016

BugLink: http://bugs.launchpad.net/bugs/1616677

struct netvsc_device is not suitable for storing VF information as this
structure is being destroyed on MTU change / set channel operation (see
rndis_filter_device_remove()). Move all VF related stuff to struct
net_device_context which is persistent.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f9a7da91)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

e2d7d2a8

hv_netvsc: Fix VF register on bonding devices · c4bd91b5

Haiyang Zhang authored Jul 22, 2016

BugLink: http://bugs.launchpad.net/bugs/1616677

Added a condition to avoid bonding devices with same MAC registering
as VF.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e2b9f1f7)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

c4bd91b5

PCI: hv: Fix interrupt cleanup path · 3f39c1ee

Cathy Avery authored Jul 12, 2016

BugLink: http://bugs.launchpad.net/bugs/1616677

SR-IOV disabled from the host causes a memory leak.  pci-hyperv usually
first receives a PCI_EJECT notification and then proceeds to delete the
hpdev list entry in hv_eject_device_work().  Later in hv_msi_free() since
the device is no longer on the device list hpdev is NULL and hv_msi_free
returns without freeing int_desc as part of hv_int_desc_free().
Signed-off-by: Cathy Avery <cavery@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jake Oshins <jakeo@microsoft.com>
(cherry picked from commit 0c6e617f)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

3f39c1ee

x86/kernel: Audit and remove any unnecessary uses of module.h · 1a457308

Paul Gortmaker authored Jul 13, 2016

BugLink: http://bugs.launchpad.net/bugs/1616677

Historically a lot of these existed because we did not have
a distinction between what was modular code and what was providing
support to modules via EXPORT_SYMBOL and friends.  That changed
when we forked out support for the latter into the export.h file.

This means we should be able to reduce the usage of module.h
in code that is obj-y Makefile or bool Kconfig.  The advantage
in doing so is that module.h itself sources about 15 other headers;
adding significantly to what we feed cpp, and it can obscure what
headers we are effectively using.

Since module.h was the source for init.h (for __init) and for
export.h (for EXPORT_SYMBOL) we consider each obj-y/bool instance
for the presence of either and replace as needed.  Build testing
revealed some implicit header usage that was fixed up accordingly.

Note that some bool/obj-y instances remain since module.h is
the header for some exception table entry stuff, and for things
like __init_or_module (code that is tossed when MODULES=n).
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160714001901.31603-4-paul.gortmaker@windriver.comSigned-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 186f4360)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

1a457308

netvsc: Use the new in-place consumption APIs in the rx path · 000563a6

K. Y. Srinivasan authored Jul 05, 2016

BugLink: http://bugs.launchpad.net/bugs/1616677

Use the new APIs for eliminating a copy on the receive path. These new APIs also
help in minimizing the number of memory barriers we end up issuing (in the
ringbuffer code) since we can better control when we want to expose the ring
state to the host.

The patch is being resent to address earlier email issues.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 99a50bb1)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

000563a6

PCI: hv: Handle all pending messages in hv_pci_onchannelcallback() · f47e0c02

Vitaly Kuznetsov authored Jun 17, 2016

BugLink: http://bugs.launchpad.net/bugs/1616677

When we have an interrupt from the host we have a bit set in event page
indicating there are messages for the particular channel.  We need to read
them all as we won't get signaled for what was on the queue before we
cleared the bit in vmbus_on_event().  This applies to all Hyper-V drivers
and the pass-through driver should do the same.

I did not meet any bugs; the issue was found by code inspection.  We don't
have many events going through hv_pci_onchannelcallback(), which explains
why nobody reported the issue before.

While on it, fix handling non-zero vmbus_recvpacket_raw() return values by
dropping out.  If the return value is not zero, it is wrong to inspect
buffer or bytes_recvd as these may contain invalid data.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jake Oshins <jakeo@microsoft.com>
(cherry picked from commit 837d741e)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

f47e0c02

PCI: hv: Don't leak buffer in hv_pci_onchannelcallback() · caee5e30

Vitaly Kuznetsov authored May 30, 2016

BugLink: http://bugs.launchpad.net/bugs/1616677

We don't free buffer on several code paths in hv_pci_onchannelcallback(),
put kfree() to the end of the function to fix the issue.  Direct { kfree();
return; } can now be replaced with a simple 'break';
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jake Oshins <jakeo@microsoft.com>
(cherry picked from commit 60fcdac8)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

caee5e30

netvsc: get rid of completion timeouts · 550163bc

Vitaly Kuznetsov authored Jun 09, 2016

BugLink: http://bugs.launchpad.net/bugs/1616677

I'm hitting 5 second timeout in rndis_filter_set_rss_param() while setting
RSS parameters for the device. When this happens we end up returning
-ETIMEDOUT from the function and rndis_filter_device_add() falls back to
setting

        net_device->max_chn = 1;
        net_device->num_chn = 1;
        net_device->num_sc_offered = 0;

but after a moment the rndis request succeeds and subchannels start to
appear. netvsc_sc_open() does unconditional nvscdev->num_sc_offered-- and
it becomes U32_MAX-1. Consequent rndis_filter_device_remove() will hang
while waiting for all U32_MAX-1 subchannels to appear and this is not
going to happen.

The immediate issue could be solved by adding num_sc_offered > 0 check to
netvsc_sc_open() but we're getting out of sync with the host and it's not
easy to adjust things later, e.g. in this particular case we'll be creating
queues without a user request for it and races are expected. Same applies
to other parts of the driver which have the same completion timeout.

Following the trend in drivers/hv/* code I suggest we remove all these
timeouts completely. As a guest we can always trust the host we're running
on and if the host screws things up there is no easy way to recover anyway.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5362855a)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

550163bc

hv_netvsc: pass struct net_device to rndis_filter_set_offload_params() · 642ddcee

Vitaly Kuznetsov authored Jun 03, 2016

BugLink: http://bugs.launchpad.net/bugs/1616677

The only caller rndis_filter_device_add() has 'struct net_device' pointer
already.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 426d9541)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

642ddcee

hv_netvsc: pass struct net_device to rndis_filter_set_device_mac() · 266e0a2c

Vitaly Kuznetsov authored Jun 03, 2016

BugLink: http://bugs.launchpad.net/bugs/1616677

We unpack 'struct net_device' in netvsc_set_mac_addr() to get to
'struct hv_device' pointer which we use in rndis_filter_set_device_mac()
to get back to 'struct net_device'.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e834da9a)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

266e0a2c

hv_netvsc: pass struct netvsc_device to rndis_filter_{open, close}() · 998c6801

Vitaly Kuznetsov authored Jun 03, 2016

BugLink: http://bugs.launchpad.net/bugs/1616677

Both rndis_filter_open()/rndis_filter_close() use struct hv_device to
reach to struct netvsc_device only and all callers have it already.
While on it, rename net_device to nvdev in rndis_filter_open() as
net_device is misleading.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2f5fa6c8)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

998c6801

hv_netvsc: introduce {net, hv}_device_to_netvsc_device() helpers · 7117ad54

Vitaly Kuznetsov authored Jun 03, 2016

BugLink: http://bugs.launchpad.net/bugs/1616677

Make it easier to get 'struct netvsc_device' from 'struct net_device' and
'struct hv_device' by introducing inline helpers.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2625466d)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

7117ad54

hv_netvsc: remove redundant assignment in netvsc_recv_callback() · 23085756

Vitaly Kuznetsov authored Jun 03, 2016

BugLink: http://bugs.launchpad.net/bugs/1616677

net_device_ctx is assigned in the very beginning of the function and 'net'
pointer doesn't change.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4baa994d)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

23085756

hv_netvsc: Fix VF register on vlan devices · 64cf6298

Haiyang Zhang authored Jun 02, 2016

BugLink: http://bugs.launchpad.net/bugs/1616677

Added a condition to avoid vlan devices with same MAC registering
as VF.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit cb2911fe)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

64cf6298

hv_netvsc: set nvdev link after populating chn_table · cb34dd79

Vitaly Kuznetsov authored May 13, 2016

BugLink: http://bugs.launchpad.net/bugs/1616677

Crash in netvsc_send() is observed when netvsc device is re-created on
mtu change/set channels. The crash is caused by dereferencing of NULL
channel pointer which comes from chn_table. The root cause is a mixture
of two facts:
- we set nvdev pointer in net_device_context in alloc_net_device()
  before we populate chn_table.
- we populate chn_table[0] only.

The issue could be papered over by checking channel != NULL in
netvsc_send() but populating the whole chn_table and writing the
nvdev pointer afterwards seems more appropriate.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 88098834)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

cb34dd79