1. 25 Apr, 2017 10 commits
  2. 21 Apr, 2017 30 commits
    • Noa Osherovich's avatar
      IB/mlx5: Add support for active_width and active_speed in RoCE · f1b65df5
      Noa Osherovich authored
      Add missing calculation and translation of active_width and
      active_speed for RoCE.
      
      Fixes: 3f89a643 ('IB/mlx5: Extend query_device/port to ...')
      Signed-off-by: default avatarNoa Osherovich <noaos@mellanox.com>
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      f1b65df5
    • Noa Osherovich's avatar
      IB/mlx5: Set mlx5_query_roce_port's return value to void · 50f22fd8
      Noa Osherovich authored
      In case of an error, the properties reported to user
      are zeroed out, so no need for a return value.
      Signed-off-by: default avatarNoa Osherovich <noaos@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      50f22fd8
    • Noa Osherovich's avatar
      IB/core: Add HDR speed enum · 12113a35
      Noa Osherovich authored
      Add high data rate speed to the ib_port_speed enumeration.
      Signed-off-by: default avatarNoa Osherovich <noaos@mellanox.com>
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      12113a35
    • Moni Shoua's avatar
      IB/mlx5: Set correct SL in completion for RoCE · 12f8fede
      Moni Shoua authored
      There is a difference when parsing a completion entry between Ethernet
      and IB ports. When link layer is Ethernet the bits describe the type of
      L3 header in the packet. In the case when link layer is Ethernet and VLAN
      header is present the value of SL is equal to the 3 UP bits in the VLAN
      header. If VLAN header is not present then the SL is undefined and consumer
      of the completion should check if IB_WC_WITH_VLAN is set.
      
      While that, this patch also fills the vlan_id field in the completion if
      present.
      Signed-off-by: default avatarMoni Shoua <monis@mellanox.com>
      Reviewed-by: default avatarMajd Dibbiny <majd@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      12f8fede
    • Moni Shoua's avatar
      IB/cma: Send MRA for reply messages · 61c0ddbe
      Moni Shoua authored
      Current implementation of RDMA_CM sends MRA (Message Receipt
      Acknowledgment) only for request messages but not for response messages.
      
      As a result, a slow active side of the connection may send a ready-to-use
      message to the passive side in a delay that is too long for the passive
      side to wait for.
      
      This patch adds a call to ib_send_cm_mra() upon receiving a response
      message and by this tells the other side to modify the service timeout
      to a bigger value, 16 times than before. As in the request case, MRA
      for reply will be sent only if a duplicate response has arrived.
      Signed-off-by: default avatarMoni Shoua <monis@mellanox.com>
      Reviewed-by: default avatarMatan Barak <matan@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      61c0ddbe
    • Parav Pandit's avatar
      IB/mlx5: Support congestion related counters · e1f24a79
      Parav Pandit authored
      This patch adds support to query the congestion related hardware counters
      through new command and links them with other hw counters being available
      in hw_counters sysfs location.
      
      In order to reuse existing infrastructure it renames related q_counter
      data structures to more generic counters to reflect q_counters and
      congestion counters and maybe some other counters in the future.
      
      New hardware counters:
       * rp_cnp_handled - CNP packets handled by the reaction point
       * rp_cnp_ignored - CNP packets ignored by the reaction point
       * np_cnp_sent    - CNP packets sent by notification point to respond to
                           CE marked RoCE packets
       * np_ecn_marked_roce_packets - CE marked RoCE packets received by
                                      notification point
      
      It also avoids returning ENOSYS which is specific for invalid
      system call and produces the following checkpatch.pl warning.
      
      WARNING: ENOSYS means 'invalid syscall nr' and nothing else
      +		return -ENOSYS;
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Reviewed-by: default avatarEli Cohen <eli@mellanox.com>
      Reviewed-by: default avatarDaniel Jurgens <danielj@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      e1f24a79
    • Leon Romanovsky's avatar
      IB/mthca: Check validity of output parameter pointer · a43402af
      Leon Romanovsky authored
      The mthca driver didn't check supplied pointer to functions
      mthca_cmd_poll() and mthca_cmd_wait(). This caused to the following
      smatch errors:
      
      drivers/infiniband/hw/mthca/mthca_cmd.c:371 mthca_cmd_poll() error: we previously assumed 'out_param' could be null (see line 353)
      drivers/infiniband/hw/mthca/mthca_cmd.c:454 mthca_cmd_wait() error: we previously assumed 'out_param' could be null (see line 432)
      
      In reality all callers of these functions are setting out_is_imm
      flag are providing pointer too. However it is better to check
      again to remove smatch errors to achieve warning free subsystem.
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      a43402af
    • Slava Shwartsman's avatar
      IB/mlx5: Add drop flow steering rule support · a22ed86c
      Slava Shwartsman authored
      A drop rule is described by an action drop and no destination.
      If a user specified IB_FLOW_SPEC_ACTION_DROP then set the action
      to MLX5_FLOW_CONTEXT_ACTION_DROP and clear the destination.
      Signed-off-by: default avatarSlava Shwartsman <slavash@mellanox.com>
      Reviewed-by: default avatarMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      a22ed86c
    • Slava Shwartsman's avatar
      IB/core: Introduce drop flow specification · 483a3966
      Slava Shwartsman authored
      This flow steering specification identifies flow for drop by the HW.
      If user create a flow only with the drop specification,
      then all the packets that hit this flow will be dropped, otherwise the HW
      will drop only the packets that match the other L2/L3/L4 specifications.
      Signed-off-by: default avatarSlava Shwartsman <slavash@mellanox.com>
      Reviewed-by: default avatarMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      483a3966
    • Ariel Levkovich's avatar
      IB/mlx5: Use IP version matching to classify IP traffic · 19cc7524
      Ariel Levkovich authored
      This change adds the ability for flow steering to classify IPv4/6
      packets with MPLS tag (Ethertype 0x8847 and 0x8848) as standard IP
      packets and hit IPv4/6 classifed steering rules.
      
      When user added a flow rule with IP classification, driver was
      implicitly adding ethertype matching to the created rule in order
      to distinguish between IPv4 and IPv6 protocols.
      Since IP packets with MPLS tag header have MPLS ethertype, they missed
      the rule and ended up hitting the default filters.
      Such behavior prevented from MPLS packets to undergo inbound traffic
      load balancing flows (if such were defined by configuring RSS) to
      achieve higher throughput - the way that non-MPLS IP packets performed.
      
      Since our device is able to look past the MPLS tag and identify the
      next protocol we introduce this solution which replaces Ethertype
      matching by the device's capability to perform IP version parsing
      and matching in order to distinguish between IPv4 and IPv6.
      Therefore, whenever a flow with IP spec is added and device support IP
      version matching, driver will implicitly add IP version matching to the
      rule (Based on the IP spec type) without Ethertype matching which will
      cause relevant MPLS tagged packets to hit this rule as well.
      Otherwise (device doesn't support IP version matching), we fall back to
      setting Ethertype matching.
      
      If the user's filters specify an L2 ethertype and an IP spec
      the rule will then match both the ethertype and the IP version.
      
      The device's support for IP version matching is reported by the
      device via dedicated capability bit in query_device_cap and named
      outer/inner_ip_version.
      Signed-off-by: default avatarAriel Levkovich <lariel@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      19cc7524
    • Ariel Levkovich's avatar
      IB/mlx5: Add inner spec and IPv6 validation in user's flow attribute list · 0f750966
      Ariel Levkovich authored
      This change fixes an incomplete validation of the user's
      flow attributes list.
      
      Previous implementation validated only matching of IPv4 Ethertype
      to IPv4 spec of outer headers (in case both Ethernet with specified
      Ethertype and IP specs were present) and lacked the validation of:
      1. Matching of IPv6 Ethertype in Ethernet spec (if such exists) to an
         IPv6 protocol spec (if such exists).
      2. Validation of Ethertype to IP protocol matching on inner headers specs.
      Which could cause some combinations of unmatching Ethernet and IP
      protocols to pass validation and apply on the device.
      
      The fix adds validation of IPv6 Ethertype and IP spec as well as
      performing the scan on both outer and inner attributes.
      
      Fixes: 038d2ef8 ("Add flow steering support")
      Signed-off-by: default avatarAriel Levkovich <lariel@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      0f750966
    • Bodong Wang's avatar
      IB/mlx5: Fix wrong use of kfree at bad flow in create_cq_user · 44f2e99e
      Bodong Wang authored
      The kfree was called to free cqb, while it should free *cqb.
      
      Fixes: 1cbe6fc8 ("IB/mlx5: Add support for CQE compressing")
      Signed-off-by: default avatarBodong Wang <bodong@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      44f2e99e
    • Maor Gottlieb's avatar
      IB/mlx5: Enlarge autogroup flow table · 00b7c2ab
      Maor Gottlieb authored
      In order to enlarge the flow group size to 8k, we decrease
      the number of flow group types to 6 and increase the flow
      table size to 64k.
      
      Flow group size is calculated as follow:
        group_size = table_size / (#group_types + 1)
      
      Fixes: 038d2ef8 ('IB/mlx5: Add flow steering support')
      Signed-off-by: default avatarMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      00b7c2ab
    • Maor Gottlieb's avatar
      IB/mlx5: Check supported flow table size · dac388ef
      Maor Gottlieb authored
      Check that the required flow table size is supported
      by device. Return ENOMEM error if no space left.
      
      In addition change the create flow table routine
      to return ENOMEM instead of ENOSPC.
      
      Fixes: 038d2ef8 ('IB/mlx5: Add flow steering support')
      Signed-off-by: default avatarMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      dac388ef
    • Maor Gottlieb's avatar
      IB/mlx5: Change vma from shared to private · 13776612
      Maor Gottlieb authored
      Anonymous VMA (->vm_ops == NULL) cannot be shared, otherwise
      it would lead to SIGBUS.
      
      Remove the shared flags from the vma after we change it to be
      anonymous.
      
      This is easily reproduced by doing modprobe -r while running a
      user-space application such as raw_ethernet_bw.
      
      Fixes: 7c2344c3 ('IB/mlx5: Implements disassociate_ucontext API')
      Signed-off-by: default avatarMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      13776612
    • Maor Gottlieb's avatar
      IB/mlx5: Take write semaphore when changing the vma struct · ecc7d83b
      Maor Gottlieb authored
      When the driver disassociate user context, it changes the vma to
      anonymous by setting the vm_ops to null and zap the vma ptes.
      
      In order to avoid race in the kernel, we need to take write lock
      before we change the vma entries.
      
      Fixes: 7c2344c3 ('IB/mlx5: Implements disassociate_ucontext API')
      Signed-off-by: default avatarMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      ecc7d83b
    • Maor Gottlieb's avatar
      IB/mlx4: Change vma from shared to private · ca37a664
      Maor Gottlieb authored
      Anonymous VMA (->vm_ops == NULL) cannot be shared, otherwise
      it would lead to SIGBUS.
      
      Remove the shared flags from the vma after we change it to be
      anonymous.
      
      This is easily reproduced by doing modprobe -r while running a
      user-space application such as raw_ethernet_bw.
      
      Fixes: ae184dde ('IB/mlx4_ib: Disassociate support')
      Signed-off-by: default avatarMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      ca37a664
    • Maor Gottlieb's avatar
      IB/mlx4: Take write semaphore when changing the vma struct · 22c3653d
      Maor Gottlieb authored
      When the driver disassociate user context, it changes the vma to
      anonymous by setting the vm_ops to null and zap the vma ptes.
      
      In order to avoid race in the kernel, we need to take write lock
      before we change the vma entries.
      
      Fixes: ae184dde ('IB/mlx4_ib: Disassociate support')
      Signed-off-by: default avatarMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      22c3653d
    • Jack Morgenstein's avatar
      IB/mlx4: Reduce SRIOV multicast cleanup warning message to debug level · fb7a9174
      Jack Morgenstein authored
      A warning message during SRIOV multicast cleanup should have actually been
      a debug level message. The condition generating the warning does no harm
      and can fill the message log.
      
      In some cases, during testing, some tests were so intense as to swamp the
      message log with these warning messages, causing a stall in the console
      message log output task. This stall caused an NMI to be sent to all CPUs
      (so that they all dumped their stacks into the message log).
      Aside from the message flood causing an NMI, the tests all passed.
      
      Once the message flood which caused the NMI is removed (by reducing the
      warning message to debug level), the NMI no longer occurs.
      
      Sample message log (console log) output illustrating the flood and
      resultant NMI (snippets with comments and modified with ... instead
      of hex digits, to satisfy checkpatch.pl):
      
       <mlx4_ib> _mlx4_ib_mcg_port_cleanup: ... WARNING: group refcount 1!!!...
       *** About 4000 almost identical lines in less than one second ***
       <mlx4_ib> _mlx4_ib_mcg_port_cleanup: ... WARNING: group refcount 1!!!...
       INFO: rcu_sched detected stalls on CPUs/tasks: { 17} (...)
       *** { 17} above indicates that CPU 17 was the one that stalled ***
       sending NMI to all CPUs:
       ...
       NMI backtrace for cpu 17
       CPU: 17 PID: 45909 Comm: kworker/17:2
       Hardware name: HP ProLiant DL360p Gen8, BIOS P71 09/08/2013
       Workqueue: events fb_flashcursor
       task: ffff880478...... ti: ffff88064e...... task.ti: ffff88064e......
       RIP: 0010:[ffffffff81......]  [ffffffff81......] io_serial_in+0x15/0x20
       RSP: 0018:ffff88064e257cb0  EFLAGS: 00000002
       RAX: 0000000000...... RBX: ffffffff81...... RCX: 0000000000......
       RDX: 0000000000...... RSI: 0000000000...... RDI: ffffffff81......
       RBP: ffff88064e...... R08: ffffffff81...... R09: 0000000000......
       R10: 0000000000...... R11: ffff88064e...... R12: 0000000000......
       R13: 0000000000...... R14: ffffffff81...... R15: 0000000000......
       FS:  0000000000......(0000) GS:ffff8804af......(0000) knlGS:000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080......
       CR2: 00007f2a2f...... CR3: 0000000001...... CR4: 0000000000......
       DR0: 0000000000...... DR1: 0000000000...... DR2: 0000000000......
       DR3: 0000000000...... DR6: 00000000ff...... DR7: 0000000000......
       Stack:
       ffff88064e...... ffffffff81...... ffffffff81...... 0000000000......
       ffffffff81...... ffff88064e...... ffffffff81...... ffffffff81......
       ffffffff81...... ffff88064e...... ffffffff81...... 0000000000......
       Call Trace:
      [<ffffffff813d099b>] wait_for_xmitr+0x3b/0xa0
      [<ffffffff813d0b5c>] serial8250_console_putchar+0x1c/0x30
      [<ffffffff813d0b40>] ? serial8250_console_write+0x140/0x140
      [<ffffffff813cb5fa>] uart_console_write+0x3a/0x80
      [<ffffffff813d0aae>] serial8250_console_write+0xae/0x140
      [<ffffffff8107c4d1>] call_console_drivers.constprop.15+0x91/0xf0
      [<ffffffff8107d6cf>] console_unlock+0x3bf/0x400
      [<ffffffff813503cd>] fb_flashcursor+0x5d/0x140
      [<ffffffff81355c30>] ? bit_clear+0x120/0x120
      [<ffffffff8109d5fb>] process_one_work+0x17b/0x470
      [<ffffffff8109e3cb>] worker_thread+0x11b/0x400
      [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
      [<ffffffff810a5aef>] kthread+0xcf/0xe0
      [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
      [<ffffffff81645858>] ret_from_fork+0x58/0x90
      [<ffffffff810a5a20>] ? kthread_create_on_node+0x140/0x140
      Code: 48 89 e5 d3 e6 48 63 f6 48 03 77 10 8b 06 5d c3 66 0f 1f 44 00 00 66 66 66 6
      
      As indicated in the stack trace above, the console output task got swamped.
      
      Fixes: b9c5d6a6 ("IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV")
      Cc: <stable@vger.kernel.org> # v3.6+
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      fb7a9174
    • Jack Morgenstein's avatar
      IB/mlx4: Fix ib device initialization error flow · 99e68909
      Jack Morgenstein authored
      In mlx4_ib_add, procedure mlx4_ib_alloc_eqs is called to allocate EQs.
      
      However, in the mlx4_ib_add error flow, procedure mlx4_ib_free_eqs is not
      called to free the allocated EQs.
      
      Fixes: e605b743 ("IB/mlx4: Increase the number of vectors (EQs) available for ULPs")
      Cc: <stable@vger.kernel.org> # v3.4+
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      99e68909
    • Majd Dibbiny's avatar
      IB/mlx4: Support RAW Ethernet when RoCE is disabled · dd77abf8
      Majd Dibbiny authored
      On some environments, such as certain SR-IOV VF configurations, RoCE
      isn't supported for mlx4 Ethernet ports. Currently the driver will
      not open IB device on that port.
      
      This is problematic since we do want user-space RAW Ethernet QPs functionality
      to remain in place. For that end, enhance the relevant driver flows such that we
      do create a device instance in that case.
      Signed-off-by: default avatarMajd Dibbiny <majd@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      dd77abf8
    • Jack Morgenstein's avatar
      IB/core: Fix sysfs registration error flow · b312be3d
      Jack Morgenstein authored
      The kernel commit cited below restructured ib device management
      so that the device kobject is initialized in ib_alloc_device.
      
      As part of the restructuring, the kobject is now initialized in
      procedure ib_alloc_device, and is later added to the device hierarchy
      in the ib_register_device call stack, in procedure
      ib_device_register_sysfs (which calls device_add).
      
      However, in the ib_device_register_sysfs error flow, if an error
      occurs following the call to device_add, the cleanup procedure
      device_unregister is called. This call results in the device object
      being deleted -- which results in various use-after-free crashes.
      
      The correct cleanup call is device_del -- which undoes device_add
      without deleting the device object.
      
      The device object will then (correctly) be deleted in the
      ib_register_device caller's error cleanup flow, when the caller invokes
      ib_dealloc_device.
      
      Fixes: 55aeed06 ("IB/core: Make ib_alloc_device init the kobject")
      Cc: <stable@vger.kernel.org> # v4.2+
      Signed-off-by: default avatarJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      b312be3d
    • Parav Pandit's avatar
      IB/core: Fix kernel crash during fail to initialize device · 4be3a4fa
      Parav Pandit authored
      This patch fixes the kernel crash that occurs during ib_dealloc_device()
      called due to provider driver fails with an error after
      ib_alloc_device() and before it can register using ib_register_device().
      
      This crashed seen in tha lab as below which can occur with any IB device
      which fails to perform its device initialization before invoking
      ib_register_device().
      
      This patch avoids touching cache and port immutable structures if device
      is not yet initialized.
      It also releases related memory when cache and port immutable data
      structure initialization fails during register_device() state.
      
      [81416.561946] BUG: unable to handle kernel NULL pointer dereference at (null)
      [81416.570340] IP: ib_cache_release_one+0x29/0x80 [ib_core]
      [81416.576222] PGD 78da66067
      [81416.576223] PUD 7f2d7c067
      [81416.579484] PMD 0
      [81416.582720]
      [81416.587242] Oops: 0000 [#1] SMP
      [81416.722395] task: ffff8807887515c0 task.stack: ffffc900062c0000
      [81416.729148] RIP: 0010:ib_cache_release_one+0x29/0x80 [ib_core]
      [81416.735793] RSP: 0018:ffffc900062c3a90 EFLAGS: 00010202
      [81416.741823] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
      [81416.749785] RDX: 0000000000000000 RSI: 0000000000000282 RDI: ffff880859fec000
      [81416.757757] RBP: ffffc900062c3aa0 R08: ffff8808536e5ac0 R09: ffff880859fec5b0
      [81416.765708] R10: 00000000536e5c01 R11: ffff8808536e5ac0 R12: ffff880859fec000
      [81416.773672] R13: 0000000000000000 R14: ffff8808536e5ac0 R15: ffff88084ebc0060
      [81416.781621] FS:  00007fd879fab740(0000) GS:ffff88085fac0000(0000) knlGS:0000000000000000
      [81416.790522] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [81416.797094] CR2: 0000000000000000 CR3: 00000007eb215000 CR4: 00000000003406e0
      [81416.805051] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [81416.812997] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [81416.820950] Call Trace:
      [81416.824226]  ib_device_release+0x1e/0x40 [ib_core]
      [81416.829858]  device_release+0x32/0xa0
      [81416.834370]  kobject_cleanup+0x63/0x170
      [81416.839058]  kobject_put+0x25/0x50
      [81416.843319]  ib_dealloc_device+0x25/0x40 [ib_core]
      [81416.848986]  mlx5_ib_add+0x163/0x1990 [mlx5_ib]
      [81416.854414]  mlx5_add_device+0x5a/0x160 [mlx5_core]
      [81416.860191]  mlx5_register_interface+0x8d/0xc0 [mlx5_core]
      [81416.866587]  ? 0xffffffffa09e9000
      [81416.870816]  mlx5_ib_init+0x15/0x17 [mlx5_ib]
      [81416.876094]  do_one_initcall+0x51/0x1b0
      [81416.880861]  ? __vunmap+0x85/0xd0
      [81416.885113]  ? kmem_cache_alloc_trace+0x14b/0x1b0
      [81416.890768]  ? vfree+0x2e/0x70
      [81416.894762]  do_init_module+0x60/0x1fa
      [81416.899441]  load_module+0x15f6/0x1af0
      [81416.904114]  ? __symbol_put+0x60/0x60
      [81416.908709]  ? ima_post_read_file+0x3d/0x80
      [81416.913828]  ? security_kernel_post_read_file+0x6b/0x80
      [81416.920006]  SYSC_finit_module+0xa6/0xf0
      [81416.924888]  SyS_finit_module+0xe/0x10
      [81416.929568]  entry_SYSCALL_64_fastpath+0x1a/0xa9
      [81416.935089] RIP: 0033:0x7fd879494949
      [81416.939543] RSP: 002b:00007ffdbc1b4e58 EFLAGS: 00000202 ORIG_RAX: 0000000000000139
      [81416.947982] RAX: ffffffffffffffda RBX: 0000000001b66f00 RCX: 00007fd879494949
      [81416.955965] RDX: 0000000000000000 RSI: 000000000041a13c RDI: 0000000000000003
      [81416.963926] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000001b652a0
      [81416.971861] R10: 0000000000000003 R11: 0000000000000202 R12: 00007ffdbc1b3e70
      [81416.979763] R13: 00007ffdbc1b3e50 R14: 0000000000000005 R15: 0000000000000000
      [81417.008005] RIP: ib_cache_release_one+0x29/0x80 [ib_core] RSP: ffffc900062c3a90
      [81417.016045] CR2: 0000000000000000
      
      Fixes: 55aeed06 ("IB/core: Make ib_alloc_device init the kobject")
      Fixes: 7738613e ("IB/core: Add per port immutable struct to ib_device")
      Cc: <stable@vger.kernel.org> # v4.2+
      Reviewed-by: default avatarDaniel Jurgens <danielj@mellanox.com>
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      4be3a4fa
    • Feras Daoud's avatar
      IB/ipoib: Fix deadlock between ipoib_stop and mcast join flow · 3e31a490
      Feras Daoud authored
      Before calling ipoib_stop, rtnl_lock should be taken, then
      the flow clears the IPOIB_FLAG_ADMIN_UP and IPOIB_FLAG_OPER_UP
      flags, and waits for mcast completion if IPOIB_MCAST_FLAG_BUSY
      is set.
      
      On the other hand, the flow of multicast join task initializes
      a mcast completion, sets the IPOIB_MCAST_FLAG_BUSY and calls
      ipoib_mcast_join. If IPOIB_FLAG_OPER_UP flag is not set, this
      call returns EINVAL without setting the mcast completion and
      leads to a deadlock.
      
          ipoib_stop                          |
              |                               |
          clear_bit(IPOIB_FLAG_ADMIN_UP)      |
              |                               |
          Context Switch                      |
              |                       ipoib_mcast_join_task
              |                               |
              |                       spin_lock_irq(lock)
              |                               |
              |                       init_completion(mcast)
              |                               |
              |                       set_bit(IPOIB_MCAST_FLAG_BUSY)
              |                               |
              |                       Context Switch
              |                               |
          clear_bit(IPOIB_FLAG_OPER_UP)       |
              |                               |
          spin_lock_irqsave(lock)             |
              |                               |
          Context Switch                      |
              |                       ipoib_mcast_join
              |                       return (-EINVAL)
              |                               |
              |                       spin_unlock_irq(lock)
              |                               |
              |                       Context Switch
              |                               |
          ipoib_mcast_dev_flush               |
          wait_for_completion(mcast)          |
      
      ipoib_stop will wait for mcast completion for ever, and will
      not release the rtnl_lock. As a result panic occurs with the
      following trace:
      
          [13441.639268] Call Trace:
          [13441.640150]  [<ffffffff8168b579>] schedule+0x29/0x70
          [13441.641038]  [<ffffffff81688fc9>] schedule_timeout+0x239/0x2d0
          [13441.641914]  [<ffffffff810bc017>] ? complete+0x47/0x50
          [13441.642765]  [<ffffffff810a690d>] ? flush_workqueue_prep_pwqs+0x16d/0x200
          [13441.643580]  [<ffffffff8168b956>] wait_for_completion+0x116/0x170
          [13441.644434]  [<ffffffff810c4ec0>] ? wake_up_state+0x20/0x20
          [13441.645293]  [<ffffffffa05af170>] ipoib_mcast_dev_flush+0x150/0x190 [ib_ipoib]
          [13441.646159]  [<ffffffffa05ac967>] ipoib_ib_dev_down+0x37/0x60 [ib_ipoib]
          [13441.647013]  [<ffffffffa05a4805>] ipoib_stop+0x75/0x150 [ib_ipoib]
      
      Fixes: 08bc3276 ("IB/ipoib: fix for rare multicast join race condition")
      Signed-off-by: default avatarFeras Daoud <ferasda@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      3e31a490
    • Feras Daoud's avatar
      IB/ipoib: Update broadcast object if PKey value was changed in index 0 · 9a9b8112
      Feras Daoud authored
      Update the broadcast address in the priv->broadcast object when the
      Pkey value changes in index 0, otherwise the multicast GID value will
      keep the previous value of the PKey, and will not be updated.
      This leads to interface state down because the interface will keep the
      old PKey value.
      
      For example, in SR-IOV environment, if the PF changes the value of PKey
      index 0 for one of the VFs, then the VF receives PKey change event that
      triggers heavy flush. This flush calls update_parent_pkey that update the
      broadcast object and its relevant members. If in this case the multicast
      GID will not be updated, the interface state will be down.
      
      Fixes: c2904141 ("IPoIB: Fix pkey change flow for virtualization environments")
      Signed-off-by: default avatarFeras Daoud <ferasda@mellanox.com>
      Signed-off-by: default avatarErez Shitrit <erezsh@mellanox.com>
      Reviewed-by: default avatarAlex Vesker <valex@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      9a9b8112
    • yonatanc's avatar
      IB/rxe: Cache dst in QP instead of getting it for each send · 4ed6ad1e
      yonatanc authored
      In RC QP there is no need to resolve the outgoing interface
      for each packet, as this does not change during QP life cycle.
      
      Instead cache the interface on the socket and use that one.
      This improves performance by 12% by sparing redundant
      calls to rxe_find_route.
      
      ib_send_bw -d rxe0  -x 1 -n 9000 -e  -s $((1024 * 1024 )) -l 100
      
      ----------------------------------------------------------------------------------------
      |        | bytes   | iterations | BW peak[MB/sec] | BW average[MB/sec] | MsgRate[Mpps] |
      ----------------------------------------------------------------------------------------
      | before | 1048576 | 9000       | inf             | 551.21             | 0.000551      |
      | after  | 1048576 | 9000       | inf             | 615.54             | 0.000616      |
      ----------------------------------------------------------------------------------------
      
      Fixes: 8700e3e7 ("Soft RoCE driver")
      Signed-off-by: default avatarYonatan Cohen <yonatanc@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      4ed6ad1e
    • yonatanc's avatar
      IB/rxe: Offload CRC calculation when possible · cee2688e
      yonatanc authored
      Use CPU ability to perform CRC calculations, by
      replacing direct calls to crc32_le() with crypto_shash_updata().
      
      The overall performance gain measured with ib_send_bw tool is 10% and it
      was tested on "Intel CPU ES-2660 v2 @ 2.20Ghz" CPU.
      
      ib_send_bw -d rxe0  -x 1 -n 9000 -e  -s $((1024 * 1024 )) -l 100
      
      ---------------------------------------------------------------------------------------------
      |             | bytes   | iterations | BW peak[MB/sec] | BW average[MB/sec] | MsgRate[Mpps] |
      ---------------------------------------------------------------------------------------------
      | crc32_le    | 1048576 | 9000       | inf             | 497.60             | 0.000498      |
      | CRC offload | 1048576 | 9000       | inf             | 546.70             | 0.000547      |
      ---------------------------------------------------------------------------------------------
      
      Fixes: 8700e3e7 ("Soft RoCE driver")
      Signed-off-by: default avatarYonatan Cohen <yonatanc@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      cee2688e
    • Parav Pandit's avatar
      IB/rxe: Do not export module's private function · 0d38ac8a
      Parav Pandit authored
      Function rxe_rcv is used internally in RXE and don't need to be
      exported. This patch removes such export declaration.
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Reviewed-by: default avatarYuval Shaia <yuval.shaia@oracle.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      0d38ac8a
    • Parav Pandit's avatar
      IB/rxe: Avoid accessing timers for non RC QPs · 99fc12f6
      Parav Pandit authored
      This patch avoids RNR NAK timer and retransmit timer initialization and
      cleanup for non RC QPs (such as UD QP, GSI QP).
      Reviewed-by: default avatarMoni Shoua <monis@mellanox.com>
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Reviewed-by: default avatarYuval Shaia <yuval.shaia@oracle.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      99fc12f6
    • Yonatan Cohen's avatar
      IB/rxe: Add port protocol stats · 0b1e5b99
      Yonatan Cohen authored
      Expose new counters using the get_hw_stats callback.
      We expose the following counters:
      
      +---------------------+----------------------------------------+
      |      Name           |           Description                  |
      |---------------------+----------------------------------------|
      |sent_pkts            | number of sent pkts                    |
      |---------------------+----------------------------------------|
      |rcvd_pkts            | number of received packets             |
      |---------------------+----------------------------------------|
      |out_of_sequence      | number of errors due to packet         |
      |                     | transport sequence number              |
      |---------------------+----------------------------------------|
      |duplicate_request    | number of received duplicated packets. |
      |                     | A request that previously executed is  |
      |                     | named duplicated.                      |
      |---------------------+----------------------------------------|
      |rcvd_rnr_err         | number of received RNR by completer    |
      |---------------------+----------------------------------------|
      |send_rnr_err         | number of sent RNR by responder        |
      |---------------------+----------------------------------------|
      |rcvd_seq_err         | number of out of sequence packets      |
      |                     | received                               |
      |---------------------+----------------------------------------|
      |ack_deffered         | number of deferred handling of ack     |
      |                     | packets.                               |
      |---------------------+----------------------------------------|
      |retry_exceeded_err   | number of times retry exceeded         |
      |---------------------+----------------------------------------|
      |completer_retry_err  | number of times completer decided to   |
      |                     | retry                                  |
      |---------------------+----------------------------------------|
      |send_err             | number of failed send packet           |
      +---------------------+----------------------------------------+
      Signed-off-by: default avatarYonatan Cohen <yonatanc@mellanox.com>
      Reviewed-by: default avatarMoni Shoua <monis@mellanox.com>
      Reviewed-by: default avatarAndrew Boyer <andrew.boyer@dell.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      0b1e5b99