Commits · a9b6b3bc295d2360480d32049c32661e809c7c5c · nexedi / linux

02 Aug, 2016 40 commits

IB/hfi1: Rename struct ahg_ib_header to struct hfi1_ahg_info · a9b6b3bc

Dasaratharaman Chandramouli authored Jul 25, 2016

struct ahg_ib_header has no header specific information.
Rename it to struct hfi1_ahg_info
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

a9b6b3bc

IB/hfi1: Remove unused elements from struct ahg_ib_header · bd24ef5e

Dasaratharaman Chandramouli authored Jul 25, 2016

sde and hfi1_ib_header are not used anymore.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

bd24ef5e

IB/hfi1: Reset QSFP on every run through channel tuning · b5e71019

Easwar Hariharan authored Jul 25, 2016

Active QSFP cables were reset only every alternate iteration of the
channel tuning algorithm instead of every iteration due to incorrect
reset of the flag that controlled QSFP reset, resulting in using stale
QSFP status in the channel tuning algorithm.

Fixes: 8ebd4cf1 ("Add active and optical cable support")
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

b5e71019

IB/hfi1: Ignore QSFP interrupts until power stabilizes · 5fbd98dd

Easwar Hariharan authored Jul 25, 2016

Some QSFP cables assert the interrupt line as a side effect of module
plug-in and power up. This causes the SerDes and QSFP tuning algorithm
to begin cable initialization by reading the QSFP memory map over I2C,
which fails. This patch ignores any interrupt line assertion until
the module has completed power up and voltage rails have stabilized,
which can take a maximum of 500 ms per the SFF-8679 specification.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

5fbd98dd

IB/hfi1: Disable external device configuration requests · 3ca5f4c0

Easwar Hariharan authored Jul 25, 2016

QSFP CDR enablement is now controlled by determining power class
and the configuration file. We disable the DC 8051 from requesting
enablement or disabling of TX and RX CDRs by removing the code
that allowed the DC 8051 to request changes.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

3ca5f4c0

IB/rdmavt, hfi1: Fix NFSoRDMA failure with FRMR enabled · d9b13c20

Jianxin Xiong authored Jul 25, 2016

Hanging has been observed while writing a file over NFSoRDMA. Dmesg on
the server contains messages like these:

[  931.992501] svcrdma: Error -22 posting RDMA_READ
[  952.076879] svcrdma: Error -22 posting RDMA_READ
[  982.154127] svcrdma: Error -22 posting RDMA_READ
[ 1012.235884] svcrdma: Error -22 posting RDMA_READ
[ 1042.319194] svcrdma: Error -22 posting RDMA_READ

Here is why:

With the base memory management extension enabled, FRMR is used instead
of FMR. The xprtrdma server issues each RDMA read request as the following
bundle:

(1)IB_WR_REG_MR, signaled;
(2)IB_WR_RDMA_READ, signaled;
(3)IB_WR_LOCAL_INV, signaled & fencing.

These requests are signaled. In order to generate completion, the fast
register work request is processed by the hfi1 send engine after being
posted to the work queue, and the corresponding lkey is not valid until
the request is processed. However, the rdmavt driver validates lkey when
the RDMA read request is posted and thus it fails immediately with error
-EINVAL (-22).

This patch changes the work flow of local operations (fast register and
local invalidate) so that fast register work requests are always
processed immediately to ensure that the corresponding lkey is valid
when subsequent work requests are posted. Local invalidate requests are
processed immediately if fencing is not required and no previous local
invalidate request is pending.

To allow completion generation for signaled local operations that have
been processed before posting to the work queue, an internal send flag
RVT_SEND_COMPLETION_ONLY is added. The hfi1 send engine checks this flag
and only generates completion for such requests.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

d9b13c20

IB/hfi1: Add the capability for reserved operations · 856cc4c2

Mike Marciniszyn authored Jul 25, 2016

This fix allows for support of in-kernel reserved operations
without impacting the ULP user.

The low level driver can register a non-zero value which
will be transparently added to the send queue size and hidden
from the ULP in every respect.

ULP post sends will never see a full queue due to a reserved
post send and reserved operations will never exceed that
registered value.

The s_avail will continue to track the ULP swqe availability
and the difference between the reserved value and the reserved
in use will track reserved availabity.
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

856cc4c2

IB/hfi1: Fix trace message units · 23002d5b

Grzegorz Heldt authored Jul 25, 2016

Trace shows incorrect amount of allocated memory.
Fix trace to display memory in KB.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Grzegorz Heldt <grzegorz.heldt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

23002d5b

IB/hfi1: Add sysfs entry to override SDMA interrupt affinity · b14db1f0

Tadeusz Struk authored Jul 25, 2016

Add sysfs entry to allow user to override affinity for SDMA
engine interrupts.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

b14db1f0

IB/hfi1: Add static PCIe Gen3 CTLE tuning · c3f8de0b

Dean Luick authored Jul 25, 2016

Enhance the PCIe Gen3 recipe to support static CTLE tuning,
and add a switch to choose between static and dynamic
approaches.  Make discrete chips default to static CTLE
tuning.
Reviewed-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

c3f8de0b

IB/hfi1: Fix "suspicious rcu_dereference_check() usage" warnings · 8adf71fa

Jianxin Xiong authored Jul 25, 2016

This fixes the following warnings with PROVE_LOCKING and PROVE_RCU
enabled in the kernel:

case (1):
[ INFO: suspicious RCU usage. ]
drivers/infiniband/hw/hfi1/init.c:532
suspicious rcu_dereference_check() usage!

case (2):
[ INFO: suspicious RCU usage. ]
drivers/infiniband/hw/hfi1/hfi.h:1624
suspicious rcu_dereference_check() usage!
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

8adf71fa

IB/rdmavt: Add missing spin_lock_init call for rdi->n_cqs_lock · a6580f43

Jianxin Xiong authored Jul 25, 2016

This fixes the following warning with PROV_LOCKING enabled kernel:

INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 15 PID: 12286 Comm: modprobe Not tainted 4.7.0-rc5.prove_rcu+ #1
Hardware name: Intel Corporation S2600WT2R/S2600WT2R,
......
Call Trace:
[<ffffffff8139ec0d>] dump_stack+0x85/0xc8
[<ffffffff810eb765>] register_lock_class+0x415/0x4b0
[<ffffffff810ede1c>] ? __lock_acquire+0x40c/0x1960
[<ffffffff810edaa9>] __lock_acquire+0x99/0x1960
[<ffffffff8120ab62>] ? find_vmap_area+0x42/0x60
[<ffffffff8120ab39>] ? find_vmap_area+0x19/0x60
[<ffffffff810ef9d3>] lock_acquire+0xd3/0x200
[<ffffffffa049d598>] ? rvt_create_cq+0xc8/0x250 [rdmavt]
[<ffffffff81763391>] _raw_spin_lock+0x31/0x40
[<ffffffffa049d598>] ? rvt_create_cq+0xc8/0x250 [rdmavt]
[<ffffffffa049d598>] rvt_create_cq+0xc8/0x250 [rdmavt]
[<ffffffff810ead46>] ? static_obj+0x36/0x50
[<ffffffffa0469e39>] ib_alloc_cq+0x49/0x180 [ib_core]
[<ffffffffa047bed4>] ib_mad_init_device+0x204/0x6d0 [ib_core]
[<ffffffff810e968f>] ? up_write+0x1f/0x40
[<ffffffffa046e2c0>] ib_register_device+0x3d0/0x510 [ib_core]
[<ffffffffa0752410>] ? read_cc_setting_bin+0x200/0x200 [hfi1]
[<ffffffff810ead46>] ? static_obj+0x36/0x50
[<ffffffff810eb888>] ? lockdep_init_map+0x88/0x200
[<ffffffffa049cbff>] rvt_register_device+0x17f/0x320 [rdmavt]
[<ffffffffa0766caa>] hfi1_register_ib_device+0x6ca/0x7c0 [hfi1]
[<ffffffffa0733de4>] init_one+0x2b4/0x430 [hfi1]
[<ffffffff813e40a5>] local_pci_probe+0x45/0xa0
[<ffffffff813e5110>] ? pci_match_device+0xe0/0x110
[<ffffffff813e550c>] pci_device_probe+0xfc/0x140
[<ffffffff814daee9>] driver_probe_device+0x239/0x460
[<ffffffff814db1dd>] __driver_attach+0xcd/0xf0
[<ffffffff814db110>] ? driver_probe_device+0x460/0x460
[<ffffffff814d89b3>] bus_for_each_dev+0x73/0xc0
[<ffffffff814da74e>] driver_attach+0x1e/0x20
[<ffffffff814da1b3>] bus_add_driver+0x1d3/0x290
[<ffffffffa04cc114>] ? dev_init+0x114/0x114 [hfi1]
[<ffffffff814dbf60>] driver_register+0x60/0xe0
[<ffffffffa04cc114>] ? dev_init+0x114/0x114 [hfi1]
[<ffffffff813e39d0>] __pci_register_driver+0x60/0x70
[<ffffffffa04cc2aa>] hfi1_mod_init+0x196/0x1fe [hfi1]
[<ffffffff81002190>] do_one_initcall+0x50/0x190
[<ffffffff8110be72>] ? rcu_read_lock_sched_held+0x62/0x70
[<ffffffff8122d4aa>] ? kmem_cache_alloc_trace+0x23a/0x2a0
[<ffffffff811c1881>] ? do_init_module+0x27/0x1dc
[<ffffffff811c18ba>] do_init_module+0x60/0x1dc
[<ffffffff811360cc>] load_module+0x132c/0x1ac0
[<ffffffff81132c40>] ? __symbol_put+0x60/0x60
[<ffffffff8133e50d>] ? ima_post_read_file+0x3d/0x80

Cc: Stable <stable@vger.kernel.org> # 4.6+
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

a6580f43

IB/hfi1: Read all firmware versions · b3bf270b

Dean Luick authored Jul 25, 2016

Read the version of the SBus, PCIe SerDes, and Fabric Serdes
firmwares at driver load time.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

b3bf270b

IB/hfi1: Explain state complete frame details · 6854c692

Dean Luick authored Jul 25, 2016

When link up fails in LNI, the local and peer state complete
frames are reported as numbers.  Explain what the values mean
so the operator can better diagnose the problem.
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

6854c692

IB/hfi1: Modify the default number of kernel receive conexts · 8784ac02

Harish Chegondi authored Jul 25, 2016

Currently, the default number of kernel receive contexts is set to the
number of NUMA nodes on the system plus one for control context. However,
the systems that have a single socket and/or have NUMA disabled in the BIOS
will have only one receive context by default. This patch would ensure that
by default there will be at least two kernel receive contexts plus one for
control context regardless of the number of NUMA nodes on the system. The
user can override the default number of kernel receive contexts with the
krcvqs module parameter.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

8784ac02

IB/hfi1: Add support for extended memory management · c72cfe3e

Jianxin Xiong authored Jul 25, 2016

Advertise and add the capability of handing all aspects of IBTA extended
memory management support in post send.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

c72cfe3e

IB/hfi1: Work request processing for fast register mr and invalidate · 0db3dfa0

Jianxin Xiong authored Jul 25, 2016

In order to support extended memory management support, add send side
processing of work requests of type IB_WR_REG_MR, IB_WR_LOCAL_INV, and
IB_WR_SEND_WITH_INV. The first two are local operations and are supported
for both RC and UC. Send with invalidate is only supported for RC because
the corresponding IB opcodes are not defined for UC.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

0db3dfa0

IB/hfi1: Handle send with invalidate opcode in the RC recv path · a2df0c83

Jianxin Xiong authored Jul 25, 2016

As part of enabling extended memory management support, add the processing
of the RC send with invalidate.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

a2df0c83

IB/rdmavt: Handle local operations in post send · d9f87239

Jianxin Xiong authored Jul 25, 2016

Some work requests are local operations, such as IB_WR_REG_MR and
IB_WR_LOCAL_INV. They differ from non-local operations in that:

(1) Local operations can be processed immediately without being posted
to the send queue if neither fencing nor completion generation is needed.
However, to ensure correct ordering, once a local operation is posted to
the work queue due to fencing or completion requiement, all subsequent
local operations must also be posted to the work queue until all the
local operations on the work queue have completed.

(2) Local operations don't send packets over the wire and thus don't
need (and shouldn't update) the packet sequence numbers.

Define a new a flag bit for the post send table to identify local
operations.

Add a new field to the QP structure to track the number of local
operations on the send queue to determine if direct processing of new
local operations should be enabled/disabled.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

d9f87239

IB/rdmavt: Add mechanism to invalidate MR keys · e8f8b098

Jianxin Xiong authored Jul 25, 2016

In order to support extended memory management, add the mechanism to
invalidate MR keys. This includes a flag "lkey_invalid" in the MR data
structure that is to be checked when validating access to the MR via
the associated key, and two utility functions to perform fast memory
registration and memory key invalidate operations.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

e8f8b098

IB/rdmavt: Add support for ib_map_mr_sg · a41081aa

Jianxin Xiong authored Jul 25, 2016

This implements the device specific function needed by the verbs
API function ib_map_mr_sg().
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

a41081aa

IB/hfi1: Pull FECN/BECN processing to a common place · 5fd2b562

Mitko Haralanov authored Jul 25, 2016

There were multiple places where FECN/BECN processing was
being done for the different types of QPs. All of that code
was very similar, which meant that it could be pulled into
a single function used by the different QP types.

To retain the performance in the fastpath, the common code
starts with an inline function, which only calls the slow
path if the packet has any of the [FB]ECN bits set.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

5fd2b562

IB/hfi1: Fix to fully initialize send context area · 1b23f02c

Tymoteusz Kielan authored Jul 25, 2016

While handling buffer control MAD, partially initialized
dd->kernel_send_context area may cause potential dereference
of uninitialized pointers. Fix by using kzalloc_node()
instead of kmalloc_node().
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Andrzej Kacprowski <andrzej.kacprowski@intel.com>
Signed-off-by: Tymoteusz Kielan <tymoteusz.kielan@intel.com>
Signed-off-by: Andrzej Kacprowski <andrzej.kacprowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

1b23f02c

IB/hfi1: Fix integrity errors counter value calculation · 3210314a

Jakub Pawlak authored Jul 25, 2016

PMA should not sum TX and RX replay counts when reporting
local link integrity errors. Fixed by removing C_DC_TX_REPLAY
counter from calculation of the link integrity errors counter
value.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

3210314a

IB/rdmavt: Use new driver specific post send table · 2821c509

Mike Marciniszyn authored Jul 01, 2016

Change rvt_post_one_wr to use the new table mechanism for
post send.

Validate that each low level driver specifies the table.
Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

2821c509

IB/qib: Add qib post send table · 9ec4faa3

Mike Marciniszyn authored Jul 01, 2016

Add initial table for table driven post_send support.
Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

9ec4faa3

IB/hfi1: Add hfi1 post send tables · 1ac57c50

Mike Marciniszyn authored Jul 01, 2016

Add initial table for table driven post_send support.
Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

1ac57c50

IB/rdmavt: Add data structures and routines for table driven post send · afcf8f76

Mike Marciniszyn authored Jul 01, 2016

Add flexibility for driver dependent operations in post send
because different drivers will have differing post send
operation support.

This includes data structure definitions to support a table
driven scheme along with the necessary validation routine
using the new table.
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

afcf8f76

IB/hfi1: Correct receive packet handler assignment · 71e68e3d

Jakub Pawlak authored Jul 01, 2016

Prevent processing receive packet in case when opcode is
accepted by QP but handler for this type of packet is not
defined.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

71e68e3d

IB/hfi1: Improve SDMA engine assignment for user SDMA · 14833b8c

Jianxin Xiong authored Jul 01, 2016

Currently each user context is assigned a single SDMA engine
based on the VL, context id, and subcontext id. That means for
MPI applications, each rank can only use one SDMA engine for
all messages. This may create unwanted backup for independent
messages going to different destinations upon congestion at one
destination.

This patch adds the packet "dlid" to the formula of SDMA engine
selection for user SDMA requests. A simple hash table is used
to maintain even distribution among the available SDMA engines
regardless how the "dlid" values are distributed.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

14833b8c

IB/hfi1: Remove TWSI references · e014991d

Dean Luick authored Jul 01, 2016

Remove the TWSI code.  The driver now uses the kernel's built-in
i2c bit bus module.

Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

e014991d

IB/hfi1: Use built-in i2c bit-shift bus adapter · dba715f0

Dean Luick authored Jul 06, 2016

Use built-in i2c bit-shift bus adapter to control the
i2c busses on the chip.

Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

dba715f0

IB/hfi1: Refine user process affinity algorithm · b094a36f

Sebastian Sanchez authored Jul 25, 2016

When performing process affinity recommendations for MPI ranks, the current
algorithm doesn't take into account multiple HFI units. Also, real
cores and HT cores are not distinguished from one another. Therefore,
all HT cores are recommended to be assigned first within the local NUMA
node before recommending the assignments of cores in other NUMA nodes.
It's ideal to assign all real cores across all NUMA nodes first, then all
HT 1 cores, then all HT 2 cores, and so on to balance CPU workload. CPU
cores in other NUMA nodes could be running interrupt handlers, and this is
not taken into account.

To balance the CPU workload for user processes, the following
recommendation algorithm is used:

 For each user process that is opening a context on HFI Y:
  a) If all cores are assigned to user processes, start assignments all
	 over from the first core
  b) Assign real cores first, then HT cores (First set of HT cores on
	 all physical cores, then second set of HT cores, and, so on) in the
	 following order:

	 1. Same NUMA node as HFI Y and not running an IRQ handler
	 2. Same NUMA node as HFI Y and running an IRQ handler
	 3. Different NUMA node to HFI Y and not running an IRQ handler
	 4. Different NUMA node to HFI Y and running an IRQ handler
  c) Mark core as assigned in the global affinity structure. As user
	 processes are done, remove core assignments from global affinity
	 structure.

This implementation allows an arbitrary number of HT cores and provides
support for multiple HFIs.

This is being included in the kernel rather than user space due to the
fact that user space has no way of knowing the CPU recommendations for
contexts running as part of other jobs.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

b094a36f

IB/hfi1: Reserve and collapse CPU cores for contexts · d6373019

Sebastian Sanchez authored Jul 25, 2016

Kernel receive queues oversubscribe CPU cores on multi-HFI systems.
To prevent this, the kernel receive queues are separated onto
different cores, and the SDMA engine interrupts are constrained to
a lesser number of cores.

hfi1s_on_numa_node*krcvqs is the number of CPU cores that are
reserved for kernel receive queues for all HFIs. Each HFI initializes
its kernel receive queues to one of the reserved CPU cores. If there
ends up being 0 CPU cores leftover for SDMA engines, use the same
CPU cores as receive contexts.

In addition, general and control contexts are assigned to their own
CPU core, however, both types of contexts tend to have low traffic.
To save CPU cores, collapse general and control contexts to one CPU
core for all HFI units. This change prevents SDMA engine interrupts
from wrapping around general contexts.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

d6373019

IB/hfi1: Add global structure for affinity assignments · 4197344b

Dennis Dalessandro authored Jul 25, 2016

When HFI units get initialized, they each use their own mask copy for
affinity assignments. On a multi-HFI system, affinity assignments
overbook CPU cores as each HFI doesn't have knowledge of affinity
assignments for other HFI units. Therefore, some CPU cores are never
used for interrupt handlers in systems with high number of CPU cores
per NUMA node.

For multi-HFI systems, SDMA engine interrupt assignments start all over
from the first CPU in the local NUMA node after the first HFI
initialization. This change allows assignments to continue where the
last HFI unit left off.

Add global structure for affinity assignments for multiple HFIs to share
affinity mask.
Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com>
Reviewed-by: Jubin John <jubin.john@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

4197344b

IB/hfi1: Add counter to track unsupported packets drop · 2b719046

Jakub Pawlak authored Jul 01, 2016

Add sw counter to track dropped unsupported packets.
Report unsupported packets drop as the RcvError.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

2b719046

IB/hfi1: Add VL XmitDiscards counters to the opapmaquery · 583eb8b8

Jakub Pawlak authored Jul 01, 2016

Add per VL XmitDiscards counters to the opapmaquery
status and error response.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

583eb8b8

IB/hfi1: Fix trace sparse errors · ad421082

Mike Marciniszyn authored Jul 01, 2016

Fix sparse errors by making sure the fast assign destinations
are host cpu typed.

For the void __iomem *, just make the field match source
data.

Fix a bug where the hw_free trace printed the pointer vs.
the dereferenced value.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

ad421082

IB/hfi1: Separate tracepoints into specific headers · 462b6b21

Sebastian Sanchez authored Jul 01, 2016

The ftrace infrastructure used to evaluate the TRACE_SYSTEM
macro on every DEFINE_EVENT() macro. Now the TRACE_SYSTEM
macro only gets evaluated when trace/define_trace.h is
included, so the group event information is lost. This was
introduced in
commit acd388fd ("tracing: Give system name a pointer")
Therefore, each system tracepoint must be on its own file.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

462b6b21

IB/hfi1: Fix typo · 21a4c95d

Tadeusz Struk authored Jul 01, 2016

Fix a copy and paste typo in comment.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>

21a4c95d