Commits · cc169165c82e14ea43e313f937a0a475ca97e588 · Kirill Smelkov / linux

21 May, 2012 2 commits
- Merge branches 'core', 'cxgb4', 'ipath', 'iser', 'lockdep', 'mlx4', 'nes',... · cc169165
  Roland Dreier authored May 21, 2012
```
Merge branches 'core', 'cxgb4', 'ipath', 'iser', 'lockdep', 'mlx4', 'nes', 'ocrdma', 'qib' and 'raw-qp' into for-linus
```
  cc169165
- RDMA/cxgb4: Include vmalloc.h for vmalloc and vfree · e572568f
  Vipul Pandya authored May 21, 2012
```
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
```
  e572568f
19 May, 2012 4 commits

IB/mlx4: Fix mlx4_ib_add() error flow · 035b1032

Jack Morgenstein authored May 10, 2012

We need to use a different loop index for mlx4_counter_alloc() and for
device_create_file() iterations: the mlx4_counter_alloc() loop index
is used in the error flow to free counters.

If the same loop index is used for device_create_file() and, say, the
device_create_file() loop fails on the first iteration, the allocated
counters will not be freed.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>

035b1032

IB/core: Fix IB_SA_COMP_MASK macro · 02daaf27

Jack Morgenstein authored May 10, 2012

It needs parentheses around the argument, so that it can be used with
complex arguments (e.g., "n+5").
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>

02daaf27

IB/iser: Fix error flow in iser ep connection establishment · 7d9c0de4

Or Gerlitz authored Apr 29, 2012

The current error flow code was releasing the IB connection object and
calling iscsi_destroy_endpoint() directly without going through the
reference counting mechanism introduced in commit 39ff05db ("IB/iser:
Enhance disconnection logic for multi-pathing"). This resulted in a
double free of the iscsi endpoint object, which causes a kernel NULL
pointer dereference.  Fix that by plugging into the IB conn reference
counting correctly.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

7d9c0de4

IB/mlx4: Increase the number of vectors (EQs) available for ULPs · e605b743

Shlomo Pongratz authored Apr 29, 2012

Enable IB ULPs to use a larger portion of the device EQs (which map to
IRQs). The mlx4_ib driver follows the mlx4_core framework of the EQs
to be divided among the device ports. In this scheme, for each IB
port, the number of allocated EQs follows the number of cores, subject
to other system constraints, such as number available MSI-X vectors.
Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

e605b743

18 May, 2012 10 commits

RDMA/cxgb4: Add query_qp support · 67bbc055

Vipul Pandya authored May 18, 2012

This allows querying the QP state before flushing.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

67bbc055

RDMA/cxgb4: Remove kfifo usage · ec3eead2

Vipul Pandya authored May 18, 2012

Using kfifos for ID management was limiting the number of QPs and
preventing NP384 MPI jobs.  So replace it with a simple bitmap
allocator.

Remove IDs from the IDR tables before deallocating them.  This bug was
causing the BUG_ON() in insert_handle() to fire because the ID was
getting reused before being removed from the IDR table.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

ec3eead2

RDMA/cxgb4: Use vmalloc() for debugfs QP dump · d716a2a0

Vipul Pandya authored May 18, 2012

This allows dumping thousands of QPs.  Log active open failures of
interest.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

d716a2a0

RDMA/cxgb4: DB Drop Recovery for RDMA and LLD queues · 422eea0a

Vipul Pandya authored May 18, 2012

Add module option db_fc_threshold which is the count of active QPs
that trigger automatic db flow control mode.  Automatically transition
to/from flow control mode when the active qp count crosses
db_fc_theshold.

Add more db debugfs stats

On DB DROP event from the LLD, recover all the iwarp queues.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

422eea0a

RDMA/cxgb4: Disable interrupts in c4iw_ev_dispatch() · 4984037b

Vipul Pandya authored May 18, 2012

Use GFP_ATOMIC in _insert_handle() if ints are disabled.

Don't panic if we get an abort with no endpoint found.  Just log a
warning.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

4984037b

RDMA/cxgb4: Add DB Overflow Avoidance · 2c974781

Vipul Pandya authored May 18, 2012

Get FULL/EMPTY/DROP events from LLD.  On FULL event, disable normal
user mode DB rings.

Add modify_qp semantics to allow user processes to call into the
kernel to ring doobells without overflowing.

Add DB Full/Empty/Drop stats.

Mark queues when created indicating the doorbell state.

If we're in the middle of db overflow avoidance, then newly created
queues should start out in this mode.

Bump the C4IW_UVERBS_ABI_VERSION to 2 so the user mode library can
know if the driver supports the kernel mode db ringing.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

2c974781

RDMA/cxgb4: Add debugfs RDMA memory stats · 8d81ef34

Vipul Pandya authored May 18, 2012

Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

8d81ef34

cxgb4: DB Drop Recovery for RDMA and LLD queues · 3069ee9b

Vipul Pandya authored May 18, 2012

recover LLD EQs for DB drop interrupts.  This includes adding a new
db_lock, a spin lock disabling BH too, used by the recovery thread and
the ring_tx_db() paths to allow db drop recovery.

Clean up initial DB avoidance code.

Add read_eq_indices() - this allows the LLD to use the PCIe mw to
efficiently read hw eq contexts.

Add cxgb4_sync_txq_pidx() - called by iw_cxgb4 to sync up the sw/hw
pidx value.

Add flush_eq_cache() and cxgb4_flush_eq_cache().  This allows iw_cxgb4
to flush the sge eq context cache before beginning db drop recovery.

Add module parameter, dbfoifo_int_thresh, to allow tuning the db
interrupt threshold value.

Add dbfifo_int_thresh to cxgb4_lld_info so iw_cxgb4 knows the threshold.

Add module parameter, dbfoifo_drain_delay, to allow tuning the amount
of time delay between DB FULL and EMPTY upcalls to iw_cxgb4.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

3069ee9b

cxgb4: Common platform specific changes for DB Drop Recovery · 8caa1e84

Vipul Pandya authored May 18, 2012

Add platform-specific callback functions for interrupts.  This is
needed to do a single read-clear of the CAUSE register and then call
out to platform specific functions for DB threshold interrupts and DB
drop interrupts.

Add t4_mem_win_read_len() - mem-window reads for arbitrary lengths.
This is used to read the CIDX/PIDX values from EC contexts during DB
drop recovery.

Add t4_fwaddrspace_write() - sends addrspace write cmds to the fw.
Needed to flush the sge eq context cache.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

8caa1e84

cxgb4: Detect DB FULL events and notify RDMA ULD · 881806bc

Vipul Pandya authored May 18, 2012

Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

881806bc

15 May, 2012 2 commits

RDMA/cxgb4: Drop peer_abort when no endpoint found · 14b92228

Steve Wise authored Apr 30, 2012

Log a warning and drop the abort message.  Otherwise we will do a
bogus wake_up() and crash.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>

14b92228

RDMA/cxgb4: Always wake up waiters in c4iw_peer_abort_intr() · 0f1dcfae

Steve Wise authored Apr 27, 2012

This fixes a race where an ingress abort fails to wake up the thread
blocked in rdma_init() causing the app to hang.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>

0f1dcfae

14 May, 2012 12 commits

mlx4_core: Change bitmap allocator to work in round-robin fashion · f4ec9e95

Jack Morgenstein authored May 10, 2012

Under most circumstances, the bitmap allocator does not allocate the
same full 24-bit QP number immediately after a QP is destroyed.

This works by using the upper bits of a 24-bit QP number, beyond the
number of QPs that are actually available in the low level driver.
For example, say that the HCA is willing to allocate a maximum of 64K
qps.  We use the bits 23..16 as a "counter" which is incremented by 1
at each allocation so that even if the same physical QP is
re-allocated, it will not receive the same 24-bit QP number.

However, we have seen the following scenario:
1. Allocate, say, 255 QPs in succession.  This will cause a wrap of the "counter".
2. Destroy the first QP allocated, then allocate a new QP.  The new QP,
   because of the counter wraparound, will get the same FULL QP number as
   the QP just destroyed!

This is a problem because packets in transit can be erroneously
delivered to the new QP when they were meant for the old (destroyed)
QP, because the full QP number of the new QP is identical to the
destroyed QP.  (The "counter" mechanism is meant to prevent this by
having the full 24-bit QP numbers differ even if the physical QP on
the HCA is the same.  As we see above, however, this mechanism does
not always work).

The best fix for this problem is to allocate QPs in round-robin mode,
so that the physical QP numbers are not immediately re-used.
Found-by: Matthew Finlay <matt@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>

f4ec9e95

RDMA/nes: Don't call event handler if pointer is NULL · 784d135f

Tatyana Nikolova authored May 11, 2012

Don't call the ibqp event_handler pointer in the case it wasn't initialized.
Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Donald Wood <Donald.E.Wood@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

784d135f

RDMA/nes: Fix for the ORD value of the connecting peer · d3e51328

Tatyana Nikolova authored May 11, 2012

Set ORD value of the connecting peer to be at least one in order to
accommodate an RDMA READ Request message.
Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Donald Wood <Donald.E.Wood@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

d3e51328

IB/qib: Add cache line awareness to qib_qp and qib_devdata structures · 1c94283d

Mike Marciniszyn authored May 07, 2012

This patch reorganizes the QP and devdata files to be more cache line aware.

qib_qp fields in particular are split into read-mostly, send, and receive fields.

qib_devdata fields are split into read-mostly and read/write fields

Testing has show that bidirectional tests improve by as much as 100%
with this patch.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

1c94283d

IB/qib: MADs with misset M_Keys should return failure · 3236b2d4

Jim Foraker authored May 02, 2012

If a MAD is sent directly to the local HCA rather than placed on a QP
and the MAD fails M_Key checks, there is no means to generate a
timeout for the client, which may hang.  Instead we report
IB_MAD_RESULT_FAILURE, which operates the same for on-the-wire
packets, but will generate a send failure back to the client.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jim Foraker <foraker1@llnl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>

3236b2d4

IB/qib: Fix M_Key lease timeout handling · 6199c896

Jim Foraker authored May 02, 2012

If a port has an M_Key lease set, the M_Key protect bits set to 1, and
a SubnSet arrives with an invalid M_Key, an M_Key mismatch trap is
generated, the lease timer begins as expected, and eventually the
M_Key protect bits will be set back to 0 as per the spec. However, if
any other SMP with an invalid M_Key arrives, the lease timer is expired
and the M_Key protect bits remain in force.

This is not according to to spec. In particular, C14-17 says that
a lease timer that is underway is not affected by protection level
checks (ie, at protection level 1, a SubnGet with a bad M_Key may be
successful, but does not stop the timer), and C14-19 says that the timer
shall stop when a valid M_Key has been received. C14-19 is the only
compliance statement that specifies a stopping condition for the timer.

This behavior is magnified if the port's Master SM LID attribute
points at itself. In that case, the M_Key mismatch trap is sufficient to
expire the timer, and the mkey lease attribute is rendered useless.
Reviewed-by: Ram Vepa <ram.vepa@qlogic.com>
Signed-off-by: Jim Foraker <foraker1@llnl.gov>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

6199c896

IB/qib: Fix QLE734X link cycling · f665acb3

Mitko Haralanov authored May 07, 2012

The SERDES was using the incorrect Frequency Loop Bandwidth setting
causing the link to cycle through the Physical link negotiation state
machine.  Fixing the Frequency Loop Bandwidth setting in the SERDES
helps the link come up faster and more reliably.
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

f665acb3

IB/qib: Display correct value for number of contexts · 6ceaadee

Mitko Haralanov authored May 07, 2012

A "fix" for a bug with the number of contexts on a single-port board
caused the calculation to be off by one, which causes problems with
the upper layers.  The same problem exists for number of free
contexts, which is also fixed here.
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

6ceaadee

IB/qib: Correct ordering of reregister vs. port active events · 4ccf28a2

Todd Rimmer authored May 02, 2012

When a port first goes active with SMA Set(PortInfo) and reregister
bit set, the driver sends up the reregister event followed by a port
active event.

The problem is that in response to reregister event most apps try to
issue a SA query of some sort, but that fails because port is not
active.

The qib driver needs to a trivial change to correct this behavior.

This issue has been there for a while; however the recent serdes work
has probably made the delay between the reregister event and the
active event larger and hence opened the race far enough so that its
being seen more often.

The patch also changes the clientrereg local to a u8 and saves off the
rereg bit into it.  The code following the nested subn_get_portinfo()
now restores that bit per o14-12.2.1 with a logical OR from that copy.
Reviewed-by: Ram Vepa <ram.vepa@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

4ccf28a2

IB/qib: Optimize pio ack buffer allocation · bb77a077

Mike Marciniszyn authored May 07, 2012

This patch optimizes pio buffer allocation in the kernel.

For qib, kernel pio buffers are used for sending acks.  The code to
allocate the buffer would always start at 0 until it found a buffer.

This means that an average of 64 comparisions were done on each
allocate, since the busy bit won't be cleared until the bits are
refreshed when buffers are exhausted.

This patch adds two new fields in the devdata struct, last_pio and
min_kernel_pio.  last_pio is the last buffer that was allocated.
min_kernel_pio is the lowest potential available buffer.

min_kernel_pio is modifed as contexts are allocated and deallocted.
Reviewed-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

bb77a077

IB/qib: Add prefetch for eager buffers · cca195a1

Mike Marciniszyn authored May 02, 2012

Add a prefetch call when a packet has been stored. The nature of the
prefetch is correctly determined by the alternatives mechanism.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

cca195a1

MAINTAINERS: Update qib and ipath entries from QLogic to Intel · 8473c603

Mike Marciniszyn authored May 10, 2012

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

8473c603

11 May, 2012 1 commit

IB/core: Fix mismatch between locked and pinned pages · c4870eb8

Yishai Hadas authored May 10, 2012

Commit bc3e53f6 ("mm: distinguish between mlocked and pinned
pages") introduced a separate counter for pinned pages and used it in
the IB stack.  However, in ib_umem_get() the pinned counter is
incremented, but ib_umem_release() wrongly decrements the locked
counter.  Fix this.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>

c4870eb8

08 May, 2012 9 commits

IB/mlx4: Replace printk(KERN_yyy...) with pr_yyy(...) · 987c8f8f

Shlomo Pongratz authored Apr 29, 2012

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>

[ Replace one more printk_once() with pr_info_once().  - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>

987c8f8f

mlx4_core: Add second capabilities flags field · b3416f44

Shlomo Pongratz authored Apr 29, 2012

This patch adds a 64-bit flags2 features member to struct mlx4_dev to
export further features of the hardware.  The original flags field
tracks features whose support bits are advertised by the firmware in
offsets 0x40 and 0x44 of the query device capabilities command.
flags2 will track features whose support bits are scattered at various
offsets.

RSS support is the first feature to be exported through flags2.  RSS
capabilities are located at offset 0x2e.  The size of the RSS
indirection table is also given in this offset.
Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

b3416f44

IB/mlx4: Put priority bits in WQE of IBoE MLX QP · c0c1d3d7

Oren Duer authored Apr 29, 2012

Otherwise CM packets going over MLX QP1 get fixed scheduling priority 0.
We want CM packets to get the same scheduling priority, and therefore
map to the same SQ (Schedule Queue) and eventually TC (Traffic Class),
as the application requested for the actual QP used for the connection.
Signed-off-by: Oren Duer <oren@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

c0c1d3d7

IB/mlx4: Add raw packet QP support · 3987a2d3

Or Gerlitz authored Jan 17, 2012

Implement raw packet QPs for Ethernet ports using the MLX transport (as
done by the mlx4_en Ethernet netdevice driver).
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

3987a2d3

IB/core: Add raw packet QP type · c938a616

Or Gerlitz authored Mar 01, 2012

IB_QPT_RAW_PACKET allows applications to build a complete packet,
including L2 headers, when sending; on the receive side, the HW will
not strip any headers.

This QP type is designed for userspace direct access to Ethernet; for
example by applications that do TCP/IP themselves.  Only processes
with the NET_RAW capability are allowed to create raw packet QPs (the
name "raw packet QP" is supposed to suggest an analogy to AF_PACKET /
SOL_RAW sockets).
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

c938a616

RDMA/ocrdma: Fix build with IPV6=n · 34955669

Roland Dreier authored May 02, 2012

When IPV6 is not enabled:

    ERROR: "register_inet6addr_notifier" [drivers/infiniband/hw/ocrdma/ocrdma.ko] undefined!
    ERROR: "unregister_inet6addr_notifier" [drivers/infiniband/hw/ocrdma/ocrdma.ko] undefined!

Fix this by wrapping the inet6 calls in #ifdef IPV6.  Also make the
ocrdma module depend on (IPV6 || IPV6=n) to forbid the case of modular
ipv6 but built-in ocrdma (which can't work, because ocrdma calls ipv6
functions).
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Roland Dreier <roland@purestorage.com>

34955669

RDMA/ocrdma: Tiny locking cleanup · d19081e0

Dan Carpenter authored May 02, 2012

We only need to disable the IRQs one time.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

[ Rename "wq_flags" to more conventional "flags."  - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>

d19081e0

RDMA/ocrdma: Fix check for NULL instead of IS_ERR · 55a8d62a

Dan Carpenter authored May 02, 2012

The ocrdma_alloc_lkey() function never returns NULL pointers -- it
returns ERR_PTRs.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>

55a8d62a

RDMA/ocrdma: Don't sleep in atomic notifier handler · 3e4d60a8

Sasha Levin authored Apr 28, 2012

Events sent to ocrdma_inet6addr_event() are sent from an atomic context,
therefore we can't try to lock a mutex within the notifier callback.

We could just switch the mutex to a spinlock since all it does it
protect a list, but I've gone ahead and switched the list to use RCU
instead.  I couldn't fully test it since I don't have IB hardware, so
if it doesn't fully work for some reason let me know and I'll switch
it back to using a spinlock.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>

[ Fixed locking in ocrdma_add().  - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>

3e4d60a8