- 25 May, 2016 5 commits
-
-
Mark Bloch authored
In IB networks, and specifically in IPoIB/rdmacm traffic, the device address of an IPoIB interface is used as a means to exchange information between nodes needed for communication. Currently an IPoIB interface will always be created with a device address based on its node GUID without a way to change that. This change adds the ability to set the device address of an IPoIB interface by value. We use the set mac address ndo to do that. The flow should be broken down to two: 1) The GID value is already in the GID table, in this case the interface will be able to set carrier up. 2) The GID value is not yet in the GID table, in this case the interface won't try to join the multicast group and will wait (listen on GID_CHANGE event) until the GID is inserted. In order to track those changes, we add a new flag: * IPOIB_FLAG_DEV_ADDR_SET. When set, it means the dev_addr is a based on a value in the gid table. this bit will be cleared upon a dev_addr change triggered by the user and set after validation. Per IB spec the port GUID can't change if the module is loaded. port GUID is the basis for GID at index 0 which is the basis for the default device address of a ipoib interface. The issue is that there are devices that don't follow the spec, they change the port GUID while HCA is powered on, so in order not to break userspace applications. We need to check if the user wanted to control the device address and we assume that if he sets the device address back to be based on GID index 0, he no longer wishs to control it. In order to track this, we add an additional flag: * IPOIB_FLAG_DEV_ADDR_CTRL When setting the device address, there is no validation of the upper twelve bytes of the device address (flags, qpn, subnet prefix) as those bytes are not under the control of the user. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Erez Shitrit authored
Check (via an SA query) if the SM supports the new option for SendOnly multicast joins. If the SM supports that option it will use the new join state to create such multicast group. If SendOnlyFullMember is supported, we wouldn't use faked FullMember state join for SendOnly MCG, use the correct state if supported. This check is performed at every invocation of mcast_restart task, to be sure that the driver stays in sync with the current state of the SM. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Erez Shitrit authored
There are four types for MCG, FullMember, NonMember, SendOnlyNonMember, and the new added type: SendOnlyFullMember. Add support for the new SendOnlyFullMember join state. The new type allows host to send join request as sendonly, it will cause the group to be created but without getting packets from this multicast back to the host. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Christoph Lameter <cl@linux.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Erez Shitrit authored
New SA query function to return the ClassPortInfo struct from the SA. If the SM supports FullMemberSendOnly mode for MCG's, it sets a capability bit in the capability_mask2 field of the response. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Erez Shitrit authored
Change struct ib_class_port_info to conform to IB Spec 1.3 That in order to get specific capability mask from ClassPortInfo mad. >From the IB Spec, ClassPortInfo section: "CapabilityMask2 Bits 0-26: Additional class-specific capabilities... RespTimeValue the rest 5 bits" The new struct now has one field for capabilitymask2 (previously was the reserved field) and the resp_time field. And it fixes up qib and srpt, use of the field repurposed to be used as capabilitymask2: IB/qib: Change pma_get_classportinfo IB/srpt: Adjust the use of ib_class_port_info Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
- 18 May, 2016 4 commits
-
-
Matan Barak authored
Previously, mlx5_ib_cq_comp was executed from interrupt context. Under heavy load, this could cause the CPU core to be in an interrupt context too long. Instead of executing the handler from the interrupt context we execute it from a much friendly tasklet context. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Matan Barak authored
Previously, we've fired all our completion callbacks straight from our ISR. Some of those callbacks were lightweight (for example, mlx5 Ethernet napi callbacks), but some of them did more work (for example, the user-space RDMA stack uverbs' completion handler). Besides that, doing more than the minimal work in ISR is generally considered wrong, it could even lead to a hard lockup of the system. Since when a lot of completion events are generated by the hardware, the loop over those events could be so long, that we'll get into a hard lockup by the system watchdog. In order to avoid that, add a new way of invoking completion events callbacks. In the interrupt itself, we add the CQs which receive completion event to a per-EQ list and schedule a tasklet. In the tasklet context we loop over all the CQs in the list and invoke the user callback. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Christoph Lameter authored
In the Ethernet/TCP world, CAP_NET_RAW is sufficient to allow a program to listen to all incoming packets on a specific interface, and the higher CAP_NET_ADMIN is required to set the interface into promiscuous mode. We want to emulate that same basic division of privilege in the RDMA stack, so when dealing with Raw Ethernet QPs, allow apps with CAP_NET_RAW to listen to all incoming flows (and direct them as they see fit in their own listen stream). Do not require CAP_NET_ADMIN just to listen to traffic already incoming. Reserve CAP_NET_ADMIN if we attempt to set promiscuous mode. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
shamir rabinovitch authored
The problem is that the function 'send_reply_to_slave' gets the 'req_sa_mad' as a pointer whose address is only aliged to 4 bytes but is 8 bytes in size. This can result in unaligned access faults on certain architectures. Sowmini Varadhan pointed to this reply from Dave Miller that say that memcpy should not be used to solve alignment issues: https://lkml.org/lkml/2015/10/21/352 Optimization of memcpy to 'ldx' instruction can only happen if the compiler knows that the size of the data we are copying is 8 bytes and it assumes it is aligned to 8 bytes. If the compiler know the type is not aligned to 8 it must not optimize the 8 byte copy. Defining the data type as aligned to 4 forces the compiler to treat all accesses as though they aren't aligned and avoids the 'ldx' optimization. Full credit for the idea goes to Jason Gunthorpe <jgunthorpe@obsidianresearch.com>. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
- 13 May, 2016 31 commits
-
-
Doug Ledford authored
-
Majd Dibbiny authored
Report Scatter FCS support when the Firmware supports as well. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Majd Dibbiny authored
Enable Scatter FCS in the RQ context when the user passes Scatter FCS create flag. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Majd Dibbiny authored
Raw Packet QPs that were created with Scatter FCS flag, will scatter the FCS into the receive buffers. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Majd Dibbiny authored
Raw Scatter FCS device capability is set when the NIC supports scattering the FCS to the receive buffers of Raw Packet QPs. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Majd Dibbiny authored
Since all the uverbs device_cap_flags are occupied, we need a place to expose more device capabilities. This patch adds a new 64 bit device_cap_flags_ex to expose new device capabilities. The lower 32 bits will be identical to the original device_cap_flags, The upper 32 bits will be new capabilities. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Colin Ian King authored
passing hw_stats by value requires a 280 byte copy so instead pass it by reference is much more efficient. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Chien Tin Tung <chien.tin.tung@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Lars-Peter Clausen authored
Calling synchronize_irq() right before free_irq() is quite useless. On one hand the IRQ can easily fire again before free_irq() is entered, on the other hand free_irq() itself calls synchronize_irq() internally (in a race condition free way), before any state associated with the IRQ is freed. Patch was generated using the following semantic patch: // <smpl> @@ expression irq; @@ -synchronize_irq(irq); free_irq(irq, ...); // </smpl> Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Faisal Latif <faisal.latif#intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Julia Lawall authored
The i40iw_vf_cqp_ops structure is never modified, so declare it as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Guy Levi authored
By this patch, the user space library will be able to improve performance using appropriate ringing DoorBell method according to the memory type it asked for. Currently only one mapping command is allowed for UARs: MLX5_IB_MMAP_REGULAR_PAGE. Using this mapping, the kernel maps the UARs to write-combining (WC) if the system supports it. If the system is not supporting WC the UARs are mapped to non-cached(NC). In this case the user space library can't tell which mapping is applied. This patch adds 2 new mapping commands: MLX5_IB_MMAP_WC_PAGE and MLX5_IB_MMAP_NC_PAGE. For these commands the kernel maps exactly as requested and fails if it can't. Since there is no generic way to check if the requested memory region can be mapped as WC, driver enables conclusive WC mapping only for x86, PowerPC and ARM which support WC for the device's memory region. Signed-off-by: Guy Levy <guyle@mellanox.com> Signed-off-by: Moshe Lazer <moshel@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Matan Barak authored
The current mlx5 code disallows mapping the free running counter of mlx5 based hardwares when PROT_EXEC is set. Although this behaviour is correct, Linux does add an implicit VM_EXEC to the vm_flags if the READ_IMPLIES_EXEC bit is set in the process personality. This happens for example if the process stack is executable. This causes libmlx5 to output a warning and prevents the user from reading the free running clock. Executing the init segment of the hardware isn't a security risk (at least no more than executing a process own stack), so we just prevent writes to there. Fixes: d69e3bcf ('IB/mlx5: Mmap the HCA's core clock register to user-space') Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Geliang Tang authored
Simplify the code in search_relocate_mgid0_group with by using list_for_each_entry_safe(). Signed-off-by: Geliang Tang <geliangtang@163.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Mark Bloch authored
Fixes a direct call to kfree_skb when nlmsg_free should be used. Fixes: 2ca546b9 ('IB/sa: Route SA pathrecord query through netlink') Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Mark Bloch authored
Fix array overrun when going over callback table. In declaration of callback table, the max size isn't provided and in registration phase, it is provided. There is potential scenario where a new operation is added and it is not supported by current client. The acceptance of such operation by ib_netlink will cause to array overrun. Fixes: 809d5fc9 ("infiniband: pass rdma_cm module to netlink_dump_start") Fixes: b493d91d ("iwcm: common code for port mapper") Fixes: 2ca546b9 ("IB/sa: Route SA pathrecord query through netlink") Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Mark Bloch authored
RDMA_NL_GET_OP is defined like this: (type & ((1 << 10) - 1)) which means op (defined as an int) can never be a negative number. Fixes: b2cbae2c ('RDMA: Add netlink infrastructure') Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Mark Bloch authored
In case ibnl_put_msg fails in send_nlmsg_done, the function returns with -ENOMEM without freeing. This patch fixes this behavior. Fixes: 30dc5e63 ("RDMA/core: Add support for iWARP Port Mapper user space service") Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Andy Shevchenko authored
There is no need to duplicate a lot of code that is in the kernel library for ages. Replace duplicating code by calling to print_hex_dump() directly. Note that output is slightly changed: - hex and ascii parts have just two spaces delimeter - there is no delimeter for ascii portions - file and line removed from prefix (they were redundant anyway since previous output shows same closer enough) Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Colin Ian King authored
fix spelling mistake, argumant -> argument Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Denys Vlasenko authored
This function compiles to 550 bytes of machine code. Three callsites, all in nes_create_qp. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: Faisal Latif <faisal.latif@intel.com> CC: Doug Ledford <dledford@redhat.com> CC: linux-rdma@vger.kernel.org CC: linux-kernel@vger.kernel.org Reviewed-By: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Tatyana Nikolova authored
Adding sq and rq drain functions, which block until all previously posted wr-s in the specified queue have completed. A completion object is signaled to unblock the thread, when the last cqe for the corresponding queue is processed. Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Faisal Latif <faisal.latif@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Bart Van Assche authored
Avoid that sparse complains about the comparison of s_addr with INADDR_ANY. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Faisal Latif <faisal.latif@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Bart Van Assche authored
Avoid that the following BUG() is triggered against a debug kernel: kernel BUG at include/linux/scatterlist.h:92! RIP: 0010:[<ffffffffa0467199>] [<ffffffffa0467199>] srp_map_idb+0x199/0x1a0 [ib_srp] Call Trace: [<ffffffffa04685fa>] srp_map_data+0x84a/0x890 [ib_srp] [<ffffffffa0469674>] srp_queuecommand+0x1e4/0x610 [ib_srp] [<ffffffff813f5a5e>] scsi_dispatch_cmd+0x9e/0x180 [<ffffffff813f8b07>] scsi_request_fn+0x477/0x610 [<ffffffff81298ffe>] __blk_run_queue+0x2e/0x40 [<ffffffff81299070>] blk_delay_work+0x20/0x30 [<ffffffff81071f07>] process_one_work+0x197/0x480 [<ffffffff81072239>] worker_thread+0x49/0x490 [<ffffffff810787ea>] kthread+0xea/0x100 [<ffffffff8159b632>] ret_from_fork+0x22/0x40 Fixes: f7f7aab1 ("IB/srp: Convert to new registration API") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Christoph Hellwig <hch@lst.de> Cc: <stable@vger.kernel.org> # v4.4+ Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Hans Westgaard Ry authored
IPoIB collects statistics of traffic including number of packets sent/received, number of bytes transferred, and certain errors. This patch makes these statistics available to be queried by ethtool. Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Tested-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Sebastian Andrzej Siewior authored
The last user of pkey_mutex was removed in db84f880 ("IB/ipoib: Use P_Key change event instead of P_Key polling mechanism") but the lock remained. This patch removes it. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Colin Ian King authored
passing hw_stats by value requires a 280 byte copy so instead pass it by reference is much more efficient. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Chien Tin Tung <chien.tin.tung@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Lars-Peter Clausen authored
Calling synchronize_irq() right before free_irq() is quite useless. On one hand the IRQ can easily fire again before free_irq() is entered, on the other hand free_irq() itself calls synchronize_irq() internally (in a race condition free way), before any state associated with the IRQ is freed. Patch was generated using the following semantic patch: // <smpl> @@ expression irq; @@ -synchronize_irq(irq); free_irq(irq, ...); // </smpl> Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Faisal Latif <faisal.latif#intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Julia Lawall authored
The i40iw_vf_cqp_ops structure is never modified, so declare it as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Bart Van Assche authored
__force casts should be avoided if there is a better alternative. Hence modify the comparison of s_addr with INADDR_ANY such that the __force cast is no longer necessary. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Vipul Pandya <vipul@chelsio.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Hariprasad S authored
These handlers when called print error message to the kernel log, but the actual handling is done by _c4iw_free_ep() and process_timeout(). Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Hariprasad S authored
Currently c4iw_peer_abort_intr() does not wake up the waiter if the endpoint state indicates we're using MPAv2 and we're currently trying to connect. This was introduced with commit 7c0a33d6 ("RDMA/cxgb4: Don't wakeup threads for MPAv2") However, this original fix is flawed because it introduces a race that can cause a deadlock of the iwarp stack. Here is the race: ->local side sets up an active offload connection. ->local side sends MPA_START request. ->peer sends MPA_START response. ->local side ingress cpl thread begins processing the MPA_START response, but before it changes the state from MPA_REQ_SENT to FPDU_MODE: ->peer sends a RST which results in a ABORT_REQ_RSS. This triggers peer_abort_intr() which sees the state in MPA_REQ_SENT and since mpa_rev is 2, it will avoid waking up the endpoint with -ECONNRESET, assuming the stack will re-attempt the connection using MPAv1. ->Meanwhile, the cpl thread moves the state to FPDU_MODE and calls c4iw_modify_rc_qp() which calls rdma_init() which sends a RI_WR/INIT WR to firmware. But since HW sent an abort, FW correctly drops the RI_WR/INIT WR. ->So the cpl thread is stuck waiting for a reply and cannot process the ABORT_REQ_RSS cpl sitting in its input queue. Thus everything comes to a halt because no more ingress cpls are processed by the stack... The correct fix for the issue is to always do the wake up in c4iw_abort_intr() but reinitialize the wait object in c4iw_reconnect(). Fixes: 7c0a33d6 ("RDMA/cxgb4: Don't wakeup threads for MPAv2") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-
Hariprasad S authored
Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
-