Commits · 73f526da0260db5376951373c267596993dc13a8 · Kirill Smelkov / linux

11 Sep, 2009 2 commits
- Merge branch 'mad' into for-linus · 73f526da
  Roland Dreier authored Sep 10, 2009
```
Conflicts:
	drivers/infiniband/core/mad.c
```
  73f526da
- Merge branches 'cxgb3', 'ehca', 'ipath', 'ipoib', 'misc', 'mlx4', 'mthca' and 'nes' into for-linus · 45c448a1
  Roland Dreier authored Sep 10, 2009
  
  45c448a1
09 Sep, 2009 3 commits

RDMA/iwcm: Reject the connection when the cm_id is destroyed · cb58160e

Steve Wise authored Sep 09, 2009

If the cm_id of a connect request is destroyed prior to the ULP
accepting or rejecting the connection, then the provider never cleans
up the connection.  The iwcm should explicitly reject these
connections if the cm_id is destroyed.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

cb58160e

RDMA/cxgb3: Clean up properly on FW mismatch failures · ffc40c64

Steve Wise authored Sep 09, 2009

FW mismatches can cause a crash in the iw_cxgb3 event handler.

- NULL the t3cdev->ulp pointer on failures in cxio_rdev_open()
- Silently ignore events when the ulp ptr is NULL in iwch_err_handler()
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

ffc40c64

RDMA/cxgb3: Don't ignore insert_handle() failures · 13a23933

Steve Wise authored Sep 09, 2009

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

13a23933

08 Sep, 2009 1 commit

MAINTAINERS: InfiniBand/RDMA mailing list transition to vger · e6cc0fd1

Roland Dreier authored Sep 07, 2009

InfiniBand/RDMA development discussion is moving from
general@lists.openfabrics.org to linux-rdma@vger.kernel.org.
Signed-off-by: Roland Dreier <rolandd@cisco.com>

e6cc0fd1

07 Sep, 2009 2 commits

IB/mad: Allow tuning of QP0 and QP1 sizes · b76aabc3

Hal Rosenstock authored Sep 07, 2009

MADs are UD and can be dropped if there are no receives posted, so
allow receive queue size to be set with a module parameter in case the
queue needs to be lengthened.  Send side tuning is done for symmetry
with receive.
Signed-off-by: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

b76aabc3

IB/mad: Fix possible lock-lock-timer deadlock · 6b2eef8f

Roland Dreier authored Sep 07, 2009

Lockdep reported a possible deadlock with cm_id_priv->lock,
mad_agent_priv->lock and mad_agent_priv->timed_work.timer; this
happens because the mad module does

	cancel_delayed_work(&mad_agent_priv->timed_work);

while holding mad_agent_priv->lock.  cancel_delayed_work() internally
does del_timer_sync(&mad_agent_priv->timed_work.timer).

This can turn into a deadlock because mad_agent_priv->lock is taken
inside cm_id_priv->lock, so we can get the following set of contexts
that deadlock each other:

 A: holding cm_id_priv->lock, waiting for mad_agent_priv->lock
 B: holding mad_agent_priv->lock, waiting for del_timer_sync()
 C: interrupt during mad_agent_priv->timed_work.timer that takes
    cm_id_priv->lock

Fix this by using the new __cancel_delayed_work() interface (which
internally does del_timer() instead of del_timer_sync()) in all the
places where we are holding a lock.

Addresses: http://bugzilla.kernel.org/show_bug.cgi?id=13757Reported-by: Bart Van Assche <bart.vanassche@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

6b2eef8f

06 Sep, 2009 32 commits

RDMA/nes: Map MTU to IB_MTU_* and correctly report link state · cd1d3f7a

Chien Tung authored Sep 05, 2009

Old query_port code reports static MTU and link state values.
Instead, map actual MTU to next largest IB_MTU_* constant and
correctly report link state.

Cc: Steve Wise <swise@opengridcomputing.com>
Reported-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Chien Tung <chien.tin.tung@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

cd1d3f7a

RDMA/nes: Rework the disconn routine for terminate and flushing · b29a4fc4

Don Wood authored Sep 05, 2009

The disconn routine has been reworked to acoomodate the terminate and
flushing changes.  The routine has been reorganized to make all the
decisions at the start then it performs all the required operations.
This simplified the lock handling and is easier to follow.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

b29a4fc4

RDMA/nes: Use the flush code to fill in cqe error · 320cdfd2

Don Wood authored Sep 05, 2009

Use the flush status to fill in cqe status when a specific error has
been identified.  Subsequent flushed completions still use the flushed
value.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

320cdfd2

RDMA/nes: Make poll_cq return correct number of wqes during flush · 6eed5e7c

Don Wood authored Sep 05, 2009

When a flush request is given to the hw, it will place one cqe marked
as flushed (unless there is nothing to flush).  An application that is
waiting for all wqe's to complete will be left hanging.  This modifies
poll_cq to return the correct number of flushes for the pending
elements on the wq.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

6eed5e7c

RDMA/nes: Use flush mechanism to set status for wqe in error · 4b281fae

Don Wood authored Sep 05, 2009

When an asynchronous event occurs that requires a terminate, it is
sometimes possible to identify the wqe in error.  This change uses
flush to get this information to the poll routine.  The flush
operation puts the status into the cqe.  If this information is not
available, it continues to use the more generic flush code as before.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

4b281fae

RDMA/nes: Implement Terminate Packet · 8b1c9dc4

Don Wood authored Sep 05, 2009

Implement the sending and receiving of Terminate packets.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

8b1c9dc4

RDMA/nes: Add CQ error handling · 3c28b445

Don Wood authored Sep 05, 2009

CQ errors are not being handled correctly.  Put in the the upcall for
CQ errors.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

3c28b445

RDMA/nes: Clean out CQ completions when QP is destroyed · 5ee21fe0

Don Wood authored Sep 05, 2009

When a QP is destroyed, unprocessed CQ entries could still reference
the QP.  This change zeroes the context value at QP destroy time.  By
skipping over cqe's with a zero context, poll_cq no longer processes a
cqe for a destroyed QP.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

5ee21fe0

RDMA/nes: Change memory allocation for cqp request to GFP_ATOMIC · ba0c5d9a

Don Wood authored Sep 05, 2009

The routine to allocate a cqp request is not called from process
context code.  Since it is not OK to sleep, it needs to use GFP_ATOMIC
not GFP_KERNEL.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

ba0c5d9a

RDMA/nes: Allocate work item for disconnect event handling · 873fcdd4

Don Wood authored Sep 05, 2009

The code currently has a work structure in the QP. This requires a
lock and a pending flag to ensure there is never more than one request
active. When two events happen quickly (such as FIN and LLP CLOSE),
it causes unnecessary timeouts since the second one is dropped.

This fix allocates memory for the work request so the second one can
be queued. A lock is removed since it is no longer needed.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

873fcdd4

RDMA/nes: Update refcnt during disconnect · c4c3f279

Don Wood authored Sep 05, 2009

During termination, it is possible for the refcnt to go to zero while
the worker thread is posting events upward.  This fix increments the
refcnt before the request is passed to the worker thread.  The thread
decrements the refcnt when the request is completed.
Signed-off-by: Don Wood <donald.e.wood@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

c4c3f279

IB/mthca: Don't allow userspace open while recovering from catastrophic error · d8410647

Jack Morgenstein authored Sep 05, 2009

Userspace apps are supposed to release all ib device resources if they
receive a fatal async event (IBV_EVENT_DEVICE_FATAL).  However, the
app has no way of knowing when the device has come back up, except to
repeatedly attempt ibv_open_device() until it succeeds.

However, currently there is no protection against the open succeeding
while the device is in being removed following the fatal event.  In
this case, the open will succeed, but as a result the device waits in
the middle of its removal until the new app releases its resources --
and the new app will not do so, since the open succeeded at a point
following the fatal event generation.

This patch adds an "active" flag to the device. The active flag is set
to false (in the fatal event flow) before the "fatal" event is
generated, so any subsequent ibv_dev_open() call to the device will
fail until the device comes back up, thus preventing the above
deadlock.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

d8410647

IB/mthca: Distinguish multiple devices in /proc/interrupts · d94a8689

Arputham Benjamin authored Sep 05, 2009

When the mthca driver uses the same name for interrupts for every
device in the system.  This can make it very confusing trying to work
out exactly which device MSI-X interrupts are for.  Change the driver
to add the PCI name of the device to the interrupt name.
Signed-off-by: Arputham Benjamin <abenjamin@sgi.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

d94a8689

IB/mthca: Annotate CQ locking · ffe063f3

Roland Dreier authored Sep 05, 2009

mthca_ib_lock_cqs()/mthca_ib_unlock_cqs() are helper functions that
lock/unlock both CQs attached to a QP in the proper order to avoid
AB-BA deadlocks.  Annotate this so sparse can understand what's going
on (and warn us if we misuse these functions).
Signed-off-by: Roland Dreier <rolandd@cisco.com>

ffe063f3

IB/mthca: Remove unnecessary include of <linux/init.h> · deecb5d6

Roland Dreier authored Sep 05, 2009

mthca_reset.c doesn't have any function annotations, so there's no
reason to include <linux/init.h>.
Signed-off-by: Roland Dreier <rolandd@cisco.com>

deecb5d6

IB/mthca: Remove unnecessary include of <asm/page.h> · fc128558

Roland Dreier authored Sep 05, 2009

mthca_config_reg.h was including <asm/page.h> for no reason -- the whole
file is just defines of constants, so it's entirely self-contained.
Signed-off-by: Roland Dreier <rolandd@cisco.com>

fc128558

IB/mlx4: Don't allow userspace open while recovering from catastrophic error · 3b4a8cd5

Jack Morgenstein authored Sep 05, 2009

Userspace apps are supposed to release all ib device resources if they
receive a fatal async event (IBV_EVENT_DEVICE_FATAL).  However, the
app has no way of knowing when the device has come back up, except to
repeatedly attempt ibv_open_device() until it succeeds.

However, currently there is no protection against the open succeeding
while the device is in being removed following the fatal event.  In
this case, the open will succeed, but as a result the device waits in
the middle of its removal until the new app releases its resources --
and the new app will not do so, since the open succeeded at a point
following the fatal event generation.

This patch adds an "active" flag to the device. The active flag is set
to false (in the fatal event flow) before the "fatal" event is
generated, so any subsequent ibv_dev_open() call to the device will
fail until the device comes back up, thus preventing the above
deadlock.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

3b4a8cd5

mlx4_core: Distinguish multiple devices in /proc/interrupts · f5f5951c

Arputham Benjamin authored Sep 05, 2009

When the mlx4 driver uses the same name for interrupts for every
device in the system.  This can make it very confusing trying to work
out exactly which device MSI-X interrupts are for.  Change the driver
to add the PCI name of the device to the interrupt name.
Signed-off-by: Arputham Benjamin <abenjamin@sgi.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

f5f5951c

mlx4_core: Avoid double free_icms · 1af92e2a

Yevgeny Petrilin authored Sep 05, 2009

On the error path of mlx4_init_hca(), mlx4_close_hca() is called,
followed by mlx4_free_icms() and mlx4_UNMAP_FA().  But both those
functions are also called from mlx4_close_hca(), which leads to a
double free.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

1af92e2a

mlx4_core: Allocate and map sufficient ICM memory for EQ context · fa0681d2

Roland Dreier authored Sep 05, 2009

The current implementation allocates a single host page for EQ context
memory, which was OK when we only allocated a few EQs.  However, since
we now allocate an EQ for each CPU core, this patch removes the
hard-coded limit (which we exceed with 4 KB pages and 128 byte EQ
context entries with 32 CPUs) and uses the same ICM table code as all
other context tables, which ends up simplifying the code quite a bit
while fixing the problem.

This problem was actually hit in practice on a dual-socket Nehalem box
with 16 real hardware threads and sufficiently odd ACPI tables that it
shows on boot

    SMP: Allowing 32 CPUs, 16 hotplug CPUs

so num_possible_cpus() ends up 32, and mlx4 ends up creating 33 MSI-X
interrupts and 33 EQs.  This mlx4 bug means that mlx4 can't even
initialize at all on this quite mainstream system.

Cc: <stable@kernel.org>
Reported-by: Eli Cohen <eli@mellanox.co.il>
Tested-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

fa0681d2

IB/mlx4: Annotate CQ locking · 338a8fad

Roland Dreier authored Sep 05, 2009

mlx4_ib_lock_cqs()/mlx4_ib_unlock_cqs() are helper functions that
lock/unlock both CQs attached to a QP in the proper order to avoid
AB-BA deadlocks.  Annotate this so sparse can understand what's going
on (and warn us if we misuse these functions).
Signed-off-by: Roland Dreier <rolandd@cisco.com>

338a8fad

mlx4_core: Remove unnecessary includes of <linux/init.h> · ff149b2a

Roland Dreier authored Sep 05, 2009

Lots of mlx4 files with no function annotations included <linux/init.h>
for no reason.
Signed-off-by: Roland Dreier <rolandd@cisco.com>

ff149b2a

mlx4_core: Use pci_request_regions() · a01df0fe

Roland Dreier authored Sep 05, 2009

The old code used two calls to pci_request_region() to get the two BARs
for the mlx4 device, for no particularly good reason. Clean up the code
a little by converting this to a single call to pci_request_regions().
Signed-off-by: Roland Dreier <rolandd@cisco.com>

a01df0fe

RDMA/amso1100: Check kmalloc() result in c2_register_device() · 1493ab40

Roel Kluin authored Sep 05, 2009

dev->ibdev.iwcm allocation may fail, prevent a dereference.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

1493ab40

IB/uverbs: Return ENOSYS for unimplemented commands (not EINVAL) · b1b8afb8

Jack Morgenstein authored Sep 05, 2009

Since the original commit 883a99c7 ("[IB] uverbs: Add a mask of device
methods allowed for userspace"), the uverbs core returns EINVAL for
commands not implemented by a specific low-level driver.

This creates a problem that there is no way to tell the difference
between an unimplemented command and an implemented one which is
incorrectly invoked (which also returns EINVAL).

The fix is to have unimplemented commands return ENOSYS.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

b1b8afb8

IB/core: Fix send multicast group leave retry · e1d7806d

Yossi Etigin authored Sep 05, 2009

Until now, retries were only sent when joining a multicast group. This
patch will adds retries when leaving a multicast group as well.
Signed-off-by: Ron Livne <ronli@voltaire.com>
Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

e1d7806d

IB: Use printk_once() for driver versions · f1aa78b2

Marcin Slusarz authored Sep 05, 2009

Replace open-coded reimplementations with printk_once().
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

f1aa78b2

RDMA/amso1100: Use %pM conversion specifier · 181c74e8

Tobias Klauser authored Sep 05, 2009

Use the %pM conversion specifier to print a MAC address.
Signed-off-by: Tobias Klauser <klto@zhaw.ch>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

181c74e8

IB: Use DEFINE_SPINLOCK() for static spinlocks · 6276e08a

Roland Dreier authored Sep 05, 2009

Rather than just defining static spinlock_t variables and then
initializing them later in init functions, simply define them with
DEFINE_SPINLOCK() and remove the calls to spin_lock_init(). This cleans
up the source a tad and also shrinks the compiled code; eg on x86-64:

add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-40 (-40)
function old new delta
ib_uverbs_init 336 326 -10
ib_mad_init_module 147 137 -10
ib_sa_init 123 103 -20
Signed-off-by: Roland Dreier <rolandd@cisco.com>

6276e08a

IB/mad: Check hop count field in directed route MAD to avoid array overflow · 60f2b652

Roland Dreier authored Sep 05, 2009

The hop count field in a directed route MAD is only allowed to be in the
range 0 to 63 (by spec).  Check that this really is the case to avoid
accessing outside the bounds of the hop array.
Reported-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

60f2b652

IPoIB: Check multicast address format · 5e47596b

Jason Gunthorpe authored Sep 05, 2009

Check that the format of multicast link addresses is correct before
taking them from dev->mc_list to priv->multicast_list. This way we
never try to send a bogus address to the SA, which prevents badness
from erronous 'ip maddr addr add', broken bonding drivers, etc.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

5e47596b

IPoIB: Drop priv->lock before calling ipoib_send() · 721d67cd

Roland Dreier authored Sep 05, 2009

IPoIB currently must use irqsave locking for priv->lock, since it is
taken from interrupt context in one path. However, ipoib_send() does
skb_orphan(), and the network stack locking is not IRQ-safe.
Therefore we need to make sure we don't hold priv->lock when calling
ipoib_send() to avoid lockdep warnings (the code was almost certainly
safe in practice, since the only code path that takes priv->lock from
interrupt context would never call into the network stack).

Addresses: http://bugzilla.kernel.org/show_bug.cgi?id=13757Reported-by: Bart Van Assche <bart.vanassche@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

721d67cd