Commits · eaf4c6d46a6948302b64be2b7149cce22131ee0d · nexedi / linux

05 May, 2016 7 commits

RDMA/iw_cxgb4: remove abort_connection() usage from accept/reject · eaf4c6d4

Hariprasad S authored May 05, 2016

Use c4iw_ep_disconnect() instead. This is part of getting rid of
abort_connection() altogether so we properly clean up on send_abort()
failures.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

eaf4c6d4

RDMA/iw_cxgb4: free resources when send_flowc() fails · fef4422d

Hariprasad S authored May 05, 2016

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

fef4422d

RDMA/iw_cxgb4: remove connection abort from process_mpa_reply · f8e1e1d1

Hariprasad S authored May 05, 2016

Instead, have the caller, rx_data() handle the close/abort like
it does for process_mpa_request(). This is part of getting rid of
abort_connection() altogether so we properly clean up on send_abort()
failures.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

f8e1e1d1

RDMA/iw_cxgb4: ensure eps don't get freed while the mutex is held · 6e410d8f

Hariprasad S authored May 05, 2016

In rx_data(), with the ep in FPDU_MODE, refcnt=2, if we get unexpected
streaming data, we call c4iw_modify_rc_qp() and move the qp from
RTS -> TERMINATE. In c4iw_modify_rc_qp(), if rdma_fini() returns
an error, the ep will be dereferenced (refcnt=1). Then rx_data()
calls c4iw_ep_disconnect() which starts the close operation.
But if send_halfclose() fails in c4iw_ep_disconnect(), we will call
release_ep_resources() derefing the ep which reduces the refcnt to 0 and
and frees the ep. However we still has the ep mutex at that point, so we
have a touch-after-free bug. There is a similar issue where
peer_close() calls c4iw_ep_disconnect().

The solution is to add a reference to the ep in c4iw_ep_disconnect()
after acquiring the mutex, and release it after releasing the mutex.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

6e410d8f

RDMA/iw_cxgb4: stop ep timer on close failure · 88bc230d

Hariprasad S authored May 05, 2016

In c4iw_ep_disconnect(), if we start the ep timer to begin a close,
but send_halfclose() fails, we need to stop the timer and send a CLOSE
event up to the IWCM before releasing the resources. Otherwise, we can
crash when the ep timer fires if the ep is referencing a previous instance
of the device. This can happen as part of adapter reset/recovery, for
instance.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

88bc230d

RDMA/iw_cxgb4: release ep resources on accept arp failure · 9dec900c

Hariprasad S authored May 05, 2016

If ARP fails before the CPL_PASS_ACCEPT_RPL is seen by hardware, the tid
will be stuck in SYN_PEND and never released.  So create an arp failure
handler specifically for this message to release the endpoint resources.

In pass_accept_rpl_arp_failure(), put the parent endpoint so it will
be freed when destroyed.  Also we don't need to call release_tid() here
because _c4iw_free_ep() calls cxgb4_remove_tid() which releases the
hwtid.

If we get an ABORT_REQ_RSS instead of a PASS_ESTABLISH (because the
peer's ACK to our SYN is never received), then put the parent as well
in peer_abort().

Treat accept_cr() failures just like arp failures: put the parent ep
and release the ep resources destroying the tid

The ARP failure handlers are called in an atomic context, so we need to
schedule some of the processing which might block.  Namely _c4iw_free_ep()
which needs a mutex.  So create a "special" CPL opcode and handler and
schedule it via sched() to be run by process_work() in a blockable context.

Also rework the active open arp failure handler to make use of
release_ep_resources().  This allows both the active and passive arp
failure handlers to use the same deferred cleanup function.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

9dec900c

IB/iser: Fix max_sectors calculation · 9c674815

Christoph Hellwig authored Apr 18, 2016

iSER currently has a couple places that set max_sectors in either the host
template or SCSI host, and all of them get it wrong.

This patch instead uses a single assignment that (hopefully) gets it right:
the max_sectors value must be derived from the number of segments in the
FR or FMR structure, but actually be one lower than the page size multiplied
by the number of sectors, as it has to handle the case of non-aligned I/O.

Without this I get trivial to reproduce hangs when running xfstests
(on XFS) over iSER to Linux targets.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>

9c674815

29 Apr, 2016 1 commit

RDMA/nes: don't leak skb if carrier down · 4c8bb959

Florian Westphal authored Apr 24, 2016

Alternatively one could free the skb, OTOH I don't think this test is
useful so just remove it.

Cc: <linux-rdma@vger.kernel.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>

4c8bb959

28 Apr, 2016 10 commits

IB/security: Restrict use of the write() interface · e6bd18f5

Jason Gunthorpe authored Apr 10, 2016

The drivers/infiniband stack uses write() as a replacement for
bi-directional ioctl().  This is not safe. There are ways to
trigger write calls that result in the return structure that
is normally written to user space being shunted off to user
specified kernel memory instead.

For the immediate repair, detect and deny suspicious accesses to
the write API.

For long term, update the user space libraries and the kernel API
to something that doesn't present the same security vulnerabilities
(likely a structured ioctl() interface).

The impacted uAPI interfaces are generally only available if
hardware from drivers/infiniband is installed in the system.
Reported-by: Jann Horn <jann@thejh.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
[ Expanded check to all known write() entry points ]
Cc: stable@vger.kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>

e6bd18f5

IB/hfi1: Use kernel default llseek for ui device · 7723d8c2

Dean Luick authored Apr 22, 2016

The ui device llseek had a mistake with SEEK_END and did
not fully follow seek semantics.  Correct all this by
using a kernel supplied function for fixed size devices.

Cc: Al Viro <viro@ZenIV.linux.org.uk>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

7723d8c2

IB/hfi1: Don't attempt to free resources if initialization failed · 94158442

Mitko Haralanov authored Apr 20, 2016

Attempting to free resources which have not been allocated and
initialized properly led to the following kernel backtrace:

    BUG: unable to handle kernel NULL pointer dereference at           (null)
    IP: [<ffffffffa09658fe>] unlock_exp_tids.isra.8+0x2e/0x120 [hfi1]
    PGD 852a43067 PUD 85d4a6067 PMD 0
    Oops: 0000 [#1] SMP
    CPU: 0 PID: 2831 Comm: osu_bw Tainted: G          IO 3.12.18-wfr+ #1
    task: ffff88085b15b540 ti: ffff8808588fe000 task.ti: ffff8808588fe000
    RIP: 0010:[<ffffffffa09658fe>]  [<ffffffffa09658fe>] unlock_exp_tids.isra.8+0x2e/0x120 [hfi1]
    RSP: 0018:ffff8808588ffde0  EFLAGS: 00010282
    RAX: 0000000000000000 RBX: ffff880858a31800 RCX: 0000000000000000
    RDX: ffff88085d971bc0 RSI: ffff880858a318f8 RDI: ffff880858a318c0
    RBP: ffff8808588ffe20 R08: 0000000000000000 R09: 0000000000000000
    R10: ffff88087ffd6f40 R11: 0000000001100348 R12: ffff880852900000
    R13: ffff880858a318c0 R14: 0000000000000000 R15: ffff88085d971be8
    FS:  00007f4674e83740(0000) GS:ffff88087f400000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000000 CR3: 000000085c377000 CR4: 00000000001407f0
    Stack:
     ffffffffa0941a71 ffff880858a318f8 ffff88085d971bc0 ffff880858a31800
     ffff880852900000 ffff880858a31800 00000000003ffff7 ffff88085d971bc0
     ffff8808588ffe60 ffffffffa09663fc ffff8808588ffe60 ffff880858a31800
    Call Trace:
     [<ffffffffa0941a71>] ? find_mmu_handler+0x51/0x70 [hfi1]
     [<ffffffffa09663fc>] hfi1_user_exp_rcv_free+0x6c/0x120 [hfi1]
     [<ffffffffa0932809>] hfi1_file_close+0x1a9/0x340 [hfi1]
     [<ffffffff8116c189>] __fput+0xe9/0x270
     [<ffffffff8116c35e>] ____fput+0xe/0x10
     [<ffffffff81065707>] task_work_run+0xa7/0xe0
     [<ffffffff81002969>] do_notify_resume+0x59/0x80
     [<ffffffff814ffc1a>] int_signal+0x12/0x17

This commit re-arranges the context initialization code in a way that
would allow for context event flags to be used to determine whether
the context has been successfully initialized.

In turn, this can be used to skip the resource de-allocation if they
were never allocated in the first place.

Fixes: 3abb33ac ("staging/hfi1: Add TID cache receive init and free funcs")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com.
Signed-off-by: Doug Ledford <dledford@redhat.com>

94158442

IB/hfi1: Fix missing lock/unlock in verbs drain callback · b9b06cb6

Mike Marciniszyn authored Apr 20, 2016

The iowait_sdma_drained() callback lacked locking to
protect the qp s_flags field.

This causes the s_flags to be out of sync
on multiple CPUs, potentially corrupting the s_flags.

Fixes: a545f530 ("staging/rdma/hfi: fix CQ completion order issue")
Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

b9b06cb6

IB/rdmavt: Fix send scheduling · e6d2e017

Jubin John authored Apr 12, 2016

call_send is used to determine whether to send immediately or schedule
a send for later. The current logic in rdmavt is inverted and has a
negative impact on the latency of the hfi1 and qib drivers. Fix this
regression by correctly calling send immediately when call_send is set.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

e6d2e017

IB/hfi1: Prevent unpinning of wrong pages · 849e3e93

Mitko Haralanov authored Apr 12, 2016

The routine used by the SDMA cache to handle already
cached nodes can extend an already existing node.

In its error handling code, the routine will unpin pages
when not all pages of the buffer extension were pinned.

There was a bug in that part of the routine, which would
mistakenly unpin pages from the original set rather than
the newly pinned pages.

This commit fixes that bug by offsetting the page array
to the proper place pointing at the beginning of the newly
pinned pages.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

849e3e93

IB/hfi1: Fix deadlock caused by locking with wrong scope · de82bdff

Mitko Haralanov authored Apr 12, 2016

The locking around the interval RB tree is designed to prevent
access to the tree while it's being modified. The locking in its
current form is too overzealous, which is causing a deadlock in
certain cases with the following backtrace:

    Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
    CPU: 0 PID: 5836 Comm: IMB-MPI1 Tainted: G           O 3.12.18-wfr+ #1
     0000000000000000 ffff88087f206c50 ffffffff814f1caa ffffffff817b53f0
     ffff88087f206cc8 ffffffff814ecd56 0000000000000010 ffff88087f206cd8
     ffff88087f206c78 0000000000000000 0000000000000000 0000000000001662
    Call Trace:
     <NMI>  [<ffffffff814f1caa>] dump_stack+0x45/0x56
     [<ffffffff814ecd56>] panic+0xc2/0x1cb
     [<ffffffff810d4370>] ? restart_watchdog_hrtimer+0x50/0x50
     [<ffffffff810d4432>] watchdog_overflow_callback+0xc2/0xd0
     [<ffffffff81109b4e>] __perf_event_overflow+0x8e/0x2b0
     [<ffffffff8110a714>] perf_event_overflow+0x14/0x20
     [<ffffffff8101c906>] intel_pmu_handle_irq+0x1b6/0x390
     [<ffffffff814f927b>] perf_event_nmi_handler+0x2b/0x50
     [<ffffffff814f8ad8>] nmi_handle.isra.3+0x88/0x180
     [<ffffffff814f8d39>] do_nmi+0x169/0x310
     [<ffffffff814f8177>] end_repeat_nmi+0x1e/0x2e
     [<ffffffff81272600>] ? unmap_single+0x30/0x30
     [<ffffffff814f780d>] ? _raw_spin_lock_irqsave+0x2d/0x40
     [<ffffffff814f780d>] ? _raw_spin_lock_irqsave+0x2d/0x40
     [<ffffffff814f780d>] ? _raw_spin_lock_irqsave+0x2d/0x40
     <<EOE>>  <IRQ>  [<ffffffffa056c4a8>] hfi1_mmu_rb_search+0x38/0x70 [hfi1]
     [<ffffffffa05919cb>] user_sdma_free_request+0xcb/0x120 [hfi1]
     [<ffffffffa0593393>] user_sdma_txreq_cb+0x263/0x350 [hfi1]
     [<ffffffffa057fad7>] ? sdma_txclean+0x27/0x1c0 [hfi1]
     [<ffffffffa0593130>] ? user_sdma_send_pkts+0x1710/0x1710 [hfi1]
     [<ffffffffa057fdd6>] sdma_make_progress+0x166/0x480 [hfi1]
     [<ffffffff810762c9>] ? ttwu_do_wakeup+0x19/0xd0
     [<ffffffffa0581c7e>] sdma_engine_interrupt+0x8e/0x100 [hfi1]
     [<ffffffffa0546bdd>] sdma_interrupt+0x5d/0xa0 [hfi1]
     [<ffffffff81097e57>] handle_irq_event_percpu+0x47/0x1d0
     [<ffffffff81098017>] handle_irq_event+0x37/0x60
     [<ffffffff8109aa5f>] handle_edge_irq+0x6f/0x120
     [<ffffffff810044af>] handle_irq+0xbf/0x150
     [<ffffffff8104c9b7>] ? irq_enter+0x17/0x80
     [<ffffffff8150168d>] do_IRQ+0x4d/0xc0
     [<ffffffff814f7c6a>] common_interrupt+0x6a/0x6a
     <EOI>  [<ffffffff81073524>] ? finish_task_switch+0x54/0xe0
     [<ffffffff814f56c6>] __schedule+0x3b6/0x7e0
     [<ffffffff810763a6>] __cond_resched+0x26/0x30
     [<ffffffff814f5eda>] _cond_resched+0x3a/0x50
     [<ffffffff814f4f82>] down_write+0x12/0x30
     [<ffffffffa0591619>] hfi1_release_user_pages+0x69/0x90 [hfi1]
     [<ffffffffa059173a>] sdma_rb_remove+0x9a/0xc0 [hfi1]
     [<ffffffffa056c00d>] __mmu_rb_remove.isra.5+0x5d/0x70 [hfi1]
     [<ffffffffa056c536>] hfi1_mmu_rb_remove+0x56/0x70 [hfi1]
     [<ffffffffa059427b>] hfi1_user_sdma_process_request+0x74b/0x1160 [hfi1]
     [<ffffffffa055c763>] hfi1_aio_write+0xc3/0x100 [hfi1]
     [<ffffffff8116a14c>] do_sync_readv_writev+0x4c/0x80
     [<ffffffff8116b58b>] do_readv_writev+0xbb/0x230
     [<ffffffff811a9da1>] ? fsnotify+0x241/0x320
     [<ffffffff81073524>] ? finish_task_switch+0x54/0xe0
     [<ffffffff8116b795>] vfs_writev+0x35/0x60
     [<ffffffff8116b8c9>] SyS_writev+0x49/0xc0
     [<ffffffff810cd876>] ? __audit_syscall_exit+0x1f6/0x2a0
     [<ffffffff814ff992>] system_call_fastpath+0x16/0x1b

As evident from the backtrace above, the process was being put to sleep
while holding the lock.

Limiting the scope of the lock only to the RB tree operation fixes the
above error allowing for proper locking and the process being put to
sleep when needed.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

de82bdff

IB/hfi1: Prevent NULL pointer deferences in caching code · f19bd643

Mitko Haralanov authored Apr 12, 2016

There is a potential kernel crash when the MMU notifier calls the
invalidation routines in the hfi1 pinned page caching code for sdma.

The invalidation routine could call the remove callback
for the node, which in turn ends up dereferencing the
current task_struct to get a pointer to the mm_struct.
However, the mm_struct pointer could be NULL resulting in
the following backtrace:

    BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
    IP: [<ffffffffa041f75a>] sdma_rb_remove+0xaa/0x100 [hfi1]
    15
    task: ffff88085e66e080 ti: ffff88085c244000 task.ti: ffff88085c244000
    RIP: 0010:[<ffffffffa041f75a>]  [<ffffffffa041f75a>] sdma_rb_remove+0xaa/0x100 [hfi1]
    RSP: 0000:ffff88085c245878  EFLAGS: 00010002
    RAX: 0000000000000000 RBX: ffff88105b9bbd40 RCX: ffffea003931a830
    RDX: 0000000000000004 RSI: ffff88105754a9c0 RDI: ffff88105754a9c0
    RBP: ffff88085c245890 R08: ffff88105b9bbd70 R09: 00000000fffffffb
    R10: ffff88105b9bbd58 R11: 0000000000000013 R12: ffff88105754a9c0
    R13: 0000000000000001 R14: 0000000000000001 R15: ffff88105b9bbd40
    FS:  0000000000000000(0000) GS:ffff88107ef40000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000000000a8 CR3: 0000000001a0b000 CR4: 00000000001407e0
    Stack:
     ffff88105b9bbd40 ffff88080ec481a8 ffff88080ec481b8 ffff88085c2458c0
     ffffffffa03fa00e ffff88080ec48190 ffff88080ed9cd00 0000000001024000
     0000000000000000 ffff88085c245920 ffffffffa03fa0e7 0000000000000282
    Call Trace:
     [<ffffffffa03fa00e>] __mmu_rb_remove.isra.5+0x5e/0x70 [hfi1]
     [<ffffffffa03fa0e7>] mmu_notifier_mem_invalidate+0xc7/0xf0 [hfi1]
     [<ffffffffa03fa143>] mmu_notifier_page+0x13/0x20 [hfi1]
     [<ffffffff81156dd0>] __mmu_notifier_invalidate_page+0x50/0x70
     [<ffffffff81140bbb>] try_to_unmap_one+0x20b/0x470
     [<ffffffff81141ee7>] try_to_unmap_anon+0xa7/0x120
     [<ffffffff81141fad>] try_to_unmap+0x4d/0x60
     [<ffffffff8111fd7b>] shrink_page_list+0x2eb/0x9d0
     [<ffffffff81120ab3>] shrink_inactive_list+0x243/0x490
     [<ffffffff81121491>] shrink_lruvec+0x4c1/0x640
     [<ffffffff81121641>] shrink_zone+0x31/0x100
     [<ffffffff81121b0f>] kswapd_shrink_zone.constprop.62+0xef/0x1c0
     [<ffffffff811229e3>] kswapd+0x403/0x7e0
     [<ffffffff811225e0>] ? shrink_all_memory+0xf0/0xf0
     [<ffffffff81068ac0>] kthread+0xc0/0xd0
     [<ffffffff81068a00>] ? insert_kthread_work+0x40/0x40
     [<ffffffff814ff8ec>] ret_from_fork+0x7c/0xb0
     [<ffffffff81068a00>] ? insert_kthread_work+0x40/0x40

To correct this, the mm_struct passed to us by the MMU notifier is
used (which is what should have been done to begin with). This avoids
the broken derefences and ensures that the correct mm_struct is used.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

f19bd643

MAINTAINERS: Update iser/isert maintainer contact info · e7d2c25d

Sagi Grimberg authored Apr 03, 2016

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>

e7d2c25d

IB/mlx5: Expose correct max_sge_rd limit · 986ef95e

Sagi Grimberg authored Mar 31, 2016

mlx5 devices (Connect-IB, ConnectX-4, ConnectX-4-LX) has a limitation
where rdma read work queue entries cannot exceed 512 bytes.
A rdma_read wqe needs to fit in 512 bytes:
- wqe control segment (16 bytes)
- rdma segment (16 bytes)
- scatter elements (16 bytes each)

So max_sge_rd should be: (512 - 16 - 16) / 16 = 30.

Cc: linux-stable@vger.kernel.org
Reported-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagig@grimberg.me>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

986ef95e

26 Apr, 2016 5 commits

RDMA/iw_cxgb4: Fix bar2 virt addr calculation for T4 chips · 32cc92c7

Hariprasad S authored Apr 05, 2016

For T4, kernel mode qps don't use the user doorbell. User mode qps during
flow control db ringing are forced into kernel, where user doorbell is
treated as kernel doorbell and proper bar2 offset in bar2 virtual space is
calculated, which incase of T4 is a bogus address, causing a kernel panic
due to illegal write during doorbell ringing.
In case of T4, kernel mode qp bar2 virtual address should be 0. Added T4
check during bar2 virtual address calculation to return 0. Fixed Bar2
range checks based on bar2 physical address.

The below oops will be fixed

  <1>BUG: unable to handle kernel paging request at 000000000002aa08
  <1>IP: [<ffffffffa011d800>] c4iw_uld_control+0x4e0/0x880 [iw_cxgb4]
  <4>PGD 1416a8067 PUD 15bf35067 PMD 0
  <4>Oops: 0002 [#1] SMP
  <4>last sysfs file:
  /sys/devices/pci0000:00/0000:00:03.0/0000:02:00.4/infiniband/cxgb4_0/node_guid
  <4>CPU 5
  <4>Modules linked in: rdma_ucm rdma_cm ib_cm ib_sa ib_mad ib_uverbs
  ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE
  iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack
  ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge autofs4
  target_core_iblock target_core_file target_core_pscsi target_core_mod
  configfs bnx2fc cnic uio fcoe libfcoe libfc scsi_transport_fc scsi_tgt 8021q
  garp stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf vhost_net macvtap
  macvlan tun kvm uinput microcode iTCO_wdt iTCO_vendor_support sg joydev
  serio_raw i2c_i801 i2c_core lpc_ich mfd_core e1000e ptp pps_core ioatdma dca
  i7core_edac edac_core shpchp ext3 jbd mbcache sd_mod crc_t10dif pata_acpi
  ata_generic ata_piix iw_cxgb4 iw_cm ib_core ib_addr cxgb4 ipv6 dm_mirror
  dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
  <4>
  Supermicro X8ST3/X8ST3
  <4>RIP: 0010:[<ffffffffa011d800>]  [<ffffffffa011d800>]
  c4iw_uld_control+0x4e0/0x880 [iw_cxgb4]
  <4>RSP: 0000:ffff880155a03db0  EFLAGS: 00010006
  <4>RAX: 000000000000001d RBX: ffff88013ae5fc00 RCX: ffff880155adb180
  <4>RDX: 000000000002aa00 RSI: 0000000000000001 RDI: ffff88013ae5fdf8
  <4>RBP: ffff880155a03e10 R08: 0000000000000000 R09: 0000000000000001
  <4>R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
  <4>R13: 000000000000001d R14: ffff880156414ab0 R15: ffffe8ffffc05b88
  <4>FS:  0000000000000000(0000) GS:ffff8800282a0000(0000) knlGS:0000000000000000
  <4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
  <4>CR2: 000000000002aa08 CR3: 000000015bd0e000 CR4: 00000000000007e0
  <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  <4>Process cxgb4 (pid: 394, threadinfo ffff880155a00000, task ffff880156414ab0)
  <4>Stack:
  <4> ffff880156415068 ffff880155adb180 ffff880155a03df0 ffffffffa00a344b
  <4><d> 00000000000003e8 ffff880155920000 0000000000000004 ffff880155920000
  <4><d> ffff88015592d438 ffffffffa00a3860 ffff880155a03fd8 ffffe8ffffc05b88
  <4>Call Trace:
  <4> [<ffffffffa00a344b>] ? enable_txq_db+0x2b/0x80 [cxgb4]
  <4> [<ffffffffa00a3860>] ? process_db_full+0x0/0xa0 [cxgb4]
  <4> [<ffffffffa00a38a6>] process_db_full+0x46/0xa0 [cxgb4]
  <4> [<ffffffff8109fda0>] worker_thread+0x170/0x2a0
  <4> [<ffffffff810a6aa0>] ? autoremove_wake_function+0x0/0x40
  <4> [<ffffffff8109fc30>] ? worker_thread+0x0/0x2a0
  <4> [<ffffffff810a660e>] kthread+0x9e/0xc0
  <4> [<ffffffff8100c28a>] child_rip+0xa/0x20
  <4> [<ffffffff810a6570>] ? kthread+0x0/0xc0
  <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20
  <4>Code: e9 ba 00 00 00 66 0f 1f 44 00 00 44 8b 05 29 07 02 00 45 85 c0 0f 85
  71 02 00 00 8b 83 70 01 00 00 45 0f b7 ed c1 e0 0f 44 09 e8 <89> 42 08 0f ae f8
  66 c7 83 82 01 00 00 00 00 44 0f b7 ab dc 01
  <1>RIP  [<ffffffffa011d800>] c4iw_uld_control+0x4e0/0x880 [iw_cxgb4]
  <4> RSP <ffff880155a03db0>
  <4>CR2: 000000000002aa08`

Based on original work by Bharat Potnuri <bharat@chelsio.com>

Fixes: 74217d4c ("iw_cxgb4: support for bar2 qid densities exceeding the page size")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Reviewed-by: Leon Romanovsky <leon@leon.nu>
Signed-off-by: Doug Ledford <dledford@redhat.com>

32cc92c7

iw_cxgb4: handle draining an idle qp · 40edd7fd

Steve Wise authored Apr 12, 2016

In c4iw_drain_sq/rq(), if the particular queue is already empty
then don't block.

Fixes: ce4af14d94aa ('iw_cxgb4: add queue drain functions')
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

40edd7fd

iw_cxgb3: initialize ibdev.iwcm->ifname for port mapping · ad202348

Steve Wise authored Apr 12, 2016

The IWCM uses ibdev.iwcm->ifname for registration with the iwarp
port map daemon.  But iw_cxgb3 did not initialize this field which
causes intermittent registration failures based on the contents of the
uninitialized memory.

Fixes: c1340e8a ("iw_cxgb3: support for iWARP port mapping")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

ad202348

iw_cxgb4: initialize ibdev.iwcm->ifname for port mapping · 851d7b6b

Steve Wise authored Apr 12, 2016

The IWCM uses ibdev.iwcm->ifname for registration with the iwarp
port map daemon.  But iw_cxgb4 did not initialize this field which
causes intermittent registration failures based on the contents of the
uninitialized memory.

Fixes: 170003c8 ("iw_cxgb4: remove port mapper related code")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

851d7b6b

IB/core: Don't drain non-existent rq queue-pair · 42235f80

Sagi Grimberg authored Apr 26, 2016

The drain_rq function expects a normal receive qp to drain.  A qp can
only have either a normal rq or an srq.  If there is an srq, there
is no rq to drain.  Until the API supports draining SRQs, simply
skip draining the rq when the qp has an srq attached.

Fixes: 765d6774 ("IB: new common API for draining queues")
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>

42235f80

23 Apr, 2016 1 commit

IB/core: Fix oops in ib_cache_gid_set_default_gid · f4e7de63

Doug Ledford authored Apr 22, 2016

When we fail to find the default gid index, we can't continue
processing in this routine or else we will pass a negative
index to later routines resulting in invalid memory access
attempts and a kernel oops.

Fixes: 03db3a2d (IB/core: Add RoCE GID table management)
Signed-off-by: Doug Ledford <dledford@redhat.com>

f4e7de63

06 Apr, 2016 2 commits

i40iw: avoid potential uninitialized variable use · 2fe78571

Arnd Bergmann authored Mar 23, 2016

gcc finds that the i40iw_make_cm_node() function in the recently added
i40iw driver uses an uninitilized variable as an index into an array
if CONFIG_IPV6 is disabled and the driver uses IPv6 mode:

drivers/infiniband/hw/i40iw/i40iw_cm.c: In function 'i40iw_make_cm_node':
drivers/infiniband/hw/i40iw/i40iw_cm.c:2206:52: error: 'arpindex' may be used uninitialized in this function [-Werror=maybe-uninitialized]
ether_addr_copy(cm_node->rem_mac, iwdev->arp_table[arpindex].mac_addr);

As far as I can tell, this code path can not be used because the ipv4
variable is always set with CONFIG_IPV6 is disabled, but it's better
to be sure and prevent the undefined behavior, as well as shut up
that warning in a proper way.

This adds an 'else' clause for the case we get the warning about,
causing the function to return an error in a controlled way.
To avoid adding extra mess with combined io()/#ifdef clauses,
I'm also converting the existing #ifdef into a more readable
if(IS_ENABLED()) check.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: f27b4746 ("i40iw: add connection management code")
Acked-by: Mustafa Ismail <Mustafa.ismail@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

2fe78571

IB/mlx5: fix VFs callback function prototypes · 9967c70a

Arnd Bergmann authored Mar 23, 2016

The previous patch that added a couple of callback functions put
the declarations inside of an #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING,
which causes the build to fail if that option is disabled:

drivers/infiniband/hw/mlx5/main.c: In function 'mlx5_ib_add':
drivers/infiniband/hw/mlx5/main.c:2358:31: error: 'mlx5_ib_get_vf_config' undeclared (first use in this function)

This moves the four declarations below the #ifdef section so they
are always available.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: eff901d3 ("IB/mlx5: Implement callbacks for manipulating VFs")
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

9967c70a

03 Apr, 2016 5 commits

Linux 4.6-rc2 · 9735a227
Linus Torvalds authored Apr 03, 2016

9735a227

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4c3b73c6

Linus Torvalds authored Apr 03, 2016

Pull perf fixes from Ingo Molnar:
 "Misc kernel side fixes:

   - fix event leak
   - fix AMD PMU driver bug
   - fix core event handling bug
   - fix build bug on certain randconfigs

  Plus misc tooling fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/amd/ibs: Fix pmu::stop() nesting
  perf/core: Don't leak event in the syscall error path
  perf/core: Fix time tracking bug with multiplexing
  perf jit: genelf makes assumptions about endian
  perf hists: Fix determination of a callchain node's childlessness
  perf tools: Add missing initialization of perf_sample.cpumode in synthesized samples
  perf tools: Fix build break on powerpc
  perf/x86: Move events_sysfs_show() outside CPU_SUP_INTEL
  perf bench: Fix detached tarball building due to missing 'perf bench memcpy' headers
  perf tests: Fix tarpkg build test error output redirection

4c3b73c6

Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7b367f5d

Linus Torvalds authored Apr 03, 2016

Pull core kernel fixes from Ingo Molnar:
 "This contains the nohz/atomic cleanup/fix for the fetch_or() ugliness
  you noted during the original nohz pull request, plus there's also
  misc fixes:

   - fix liblockdep build bug
   - fix uapi header build bug
   - print more lockdep hash collision info to help debug recent reports
     of hash collisions
   - update MAINTAINERS email address"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  MAINTAINERS: Update my email address
  locking/lockdep: Print chain_key collision information
  uapi/linux/stddef.h: Provide __always_inline to userspace headers
  tools/lib/lockdep: Fix unsupported 'basename -s' in run_tests.sh
  locking/atomic, sched: Unexport fetch_or()
  timers/nohz: Convert tick dependency mask to atomic_t
  locking/atomic: Introduce atomic_fetch_or()

7b367f5d

v4l2-mc: avoid warning about unused variable · 17084b7e

Linus Torvalds authored Apr 03, 2016

Commit 840f5b05 ("media: au0828 disable tuner to demod link in
au0828_media_device_register()") removed all uses of the 'dtv_demod',
but left the variable itself around.

Remove it.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

17084b7e

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 30cebb6c

Linus Torvalds authored Apr 03, 2016

Pull x86 fixes from Thomas Gleixner:
 "This lot contains:

   - Some fixups for the fallout of the topology consolidation which
     unearthed AMD/Intel inconsistencies
   - Documentation for the x86 topology management
   - Support for AMD advanced power management bits
   - Two simple cleanups removing duplicated code"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Add advanced power management bits
  x86/thread_info: Merge two !__ASSEMBLY__ sections
  x86/cpufreq: Remove duplicated TDP MSR macro definitions
  x86/Documentation: Start documenting x86 topology
  x86/cpu: Get rid of compute_unit_id
  perf/x86/amd: Cleanup Fam10h NB event constraints
  x86/topology: Fix AMD core count

30cebb6c

02 Apr, 2016 8 commits

Merge tag 'rproc-v4.6-rc1' of git://github.com/andersson/remoteproc · f7eeb8a8

Linus Torvalds authored Apr 02, 2016

Pull remoteproc fix from Bjorn Andersson:
 "Fix incorrect error check in the ST remoteproc driver and advertise
  the newly created linux-remoteproc mailing list"

* tag 'rproc-v4.6-rc1' of git://github.com/andersson/remoteproc:
  MAINTAINERS: Add mailing list for remote processor subsystems
  remoteproc: st: fix check of syscon_regmap_lookup_by_phandle() return value

f7eeb8a8

Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending · d6c24df0

Linus Torvalds authored Apr 02, 2016

Pull SCSI target fixes from Nicholas Bellinger:
 "This includes fixes from HCH for -rc1 configfs default_groups
  conversion changes that ended up breaking some iscsi-target
  default_groups, along with Sagi's ib_drain_qp() conversion for
  iser-target to use the common caller now available to RDMA kernel
  consumers in v4.6+ code"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  target: add a new add_wwn_groups fabrics method
  target: initialize the nacl base CIT begfore init_nodeacl
  target: remove ->fabric_cleanup_nodeacl
  iser-target: Use ib_drain_qp

d6c24df0

Convert straggling drivers to new six-argument get_user_pages() · cb107161

Linus Torvalds authored Apr 02, 2016

Commit d4edcf0d ("mm/gup: Switch all callers of get_user_pages() to
not pass tsk/mm") switched get_user_pages() callers to the simpler model
where they no longer pass in the thread and mm pointer. But since then
we've merged changes to a few drivers that re-introduce use of the old
interface. Let's fix them up.

They continued to work fine (thanks to the truly disgusting macros
introduced in commit cde70140: "mm/gup: Overload get_user_pages()
functions"), but cause unnecessary build noise.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cb107161

Merge tag 'configfs-for-linus-2' of git://git.infradead.org/users/hch/configfs · 264800b5

Linus Torvalds authored Apr 02, 2016

Pull configfs fix from Christoph Hellwig:
 "A trivial fix to the recently introduced binary attribute helper
  macros"

* tag 'configfs-for-linus-2' of git://git.infradead.org/users/hch/configfs:
  configfs: fix CONFIGFS_BIN_ATTR_[RW]O definitions

264800b5

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 05cf8077

Linus Torvalds authored Apr 01, 2016

Pull networking fixes from David Miller:

 1) Missing device reference in IPSEC input path results in crashes
    during device unregistration.  From Subash Abhinov Kasiviswanathan.

 2) Per-queue ISR register writes not being done properly in macb
    driver, from Cyrille Pitchen.

 3) Stats accounting bugs in bcmgenet, from Patri Gynther.

 4) Lightweight tunnel's TTL and TOS were swapped in netlink dumps, from
    Quentin Armitage.

 5) SXGBE driver has off-by-one in probe error paths, from Rasmus
    Villemoes.

 6) Fix race in save/swap/delete options in netfilter ipset, from
    Vishwanath Pai.

 7) Ageing time of bridge not set properly when not operating over a
    switchdev device.  Fix from Haishuang Yan.

 8) Fix GRO regression wrt nested FOU/GUE based tunnels, from Alexander
    Duyck.

 9) IPV6 UDP code bumps wrong stats, from Eric Dumazet.

10) FEC driver should only access registers that actually exist on the
    given chipset, fix from Fabio Estevam.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (73 commits)
  net: mvneta: fix changing MTU when using per-cpu processing
  stmmac: fix MDIO settings
  Revert "stmmac: Fix 'eth0: No PHY found' regression"
  stmmac: fix TX normal DESC
  net: mvneta: use cache_line_size() to get cacheline size
  net: mvpp2: use cache_line_size() to get cacheline size
  net: mvpp2: fix maybe-uninitialized warning
  tun, bpf: fix suspicious RCU usage in tun_{attach, detach}_filter
  net: usb: cdc_ncm: adding Telit LE910 V2 mobile broadband card
  rtnl: fix msg size calculation in if_nlmsg_size()
  fec: Do not access unexisting register in Coldfire
  net: mvneta: replace MVNETA_CPU_D_CACHE_LINE_SIZE with L1_CACHE_BYTES
  net: mvpp2: replace MVPP2_CPU_D_CACHE_LINE_SIZE with L1_CACHE_BYTES
  net: dsa: mv88e6xxx: Clear the PDOWN bit on setup
  net: dsa: mv88e6xxx: Introduce _mv88e6xxx_phy_page_{read, write}
  bpf: make padding in bpf_tunnel_key explicit
  ipv6: udp: fix UDP_MIB_IGNOREDMULTI updates
  bnxt_en: Fix ethtool -a reporting.
  bnxt_en: Fix typo in bnxt_hwrm_set_pause_common().
  bnxt_en: Implement proper firmware message padding.
  ...

05cf8077

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · cf78031a

Linus Torvalds authored Apr 01, 2016

Pull clk fixes from Stephen Boyd:
 "A handful of const updates for reset ops and a couple fixes to the
  newly introduced IPQ4019 clock driver"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: qcom: ipq4019: add some fixed clocks for ddrppl and fepll
  clk: qcom: ipq4019: switch remaining defines to enums
  clk: qcom: Make reset_control_ops const
  clk: tegra: Make reset_control_ops const
  clk: sunxi: Make reset_control_ops const
  clk: atlas7: Make reset_control_ops const
  clk: rockchip: Make reset_control_ops const
  clk: mmp: Make reset_control_ops const
  clk: mediatek: Make reset_control_ops const

cf78031a

Merge tag 'pm+acpi-4.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 1826907c

Linus Torvalds authored Apr 01, 2016

Pull power management and ACPI fix from Rafael J. Wysocki:
 "Just one fix for a nasty boot failure on some systems based on Intel
  Skylake that shipped with broken firmware where enabling
  hardware-coordinated P-states management (HWP) causes a faulty
  interrupt handler in SMM to be invoked and crash the system (Srinivas
  Pandruvada)"

* tag 'pm+acpi-4.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / processor: Request native thermal interrupt handling via _OSC

1826907c

Merge branch 'akpm' (patches from Andrew) · 4e19fd93

Linus Torvalds authored Apr 01, 2016

Merge fixes from Andrew Morton:
 "11 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  .mailmap: add Christophe Ricard
  Make CONFIG_FHANDLE default y
  mm/page_isolation.c: fix the function comments
  oom, oom_reaper: do not enqueue task if it is on the oom_reaper_list head
  mm/page_isolation: fix tracepoint to mirror check function behavior
  mm/rmap: batched invalidations should use existing api
  x86/mm: TLB_REMOTE_SEND_IPI should count pages
  mm: fix invalid node in alloc_migrate_target()
  include/linux/huge_mm.h: return NULL instead of false for pmd_trans_huge_lock()
  mm, kasan: fix compilation for CONFIG_SLAB
  MAINTAINERS: orangefs mailing list is subscribers-only

4e19fd93

01 Apr, 2016 1 commit

Merge branch 'acpi-processor' · 8fbd4ade

Rafael J. Wysocki authored Apr 02, 2016

* acpi-processor:
  ACPI / processor: Request native thermal interrupt handling via _OSC

8fbd4ade