1. 13 Nov, 2017 29 commits
  2. 10 Nov, 2017 11 commits
    • Noa Osherovich's avatar
      IB/mlx5: Add PCI write end padding support · b1383aa6
      Noa Osherovich authored
      Add the PCI write end padding flag to device_cap_flags enum and set it
      during mlx5_ib_query_device so it will be reported to user-space.
      
      During WQ/QP creation, set that capability for WQ/QP if user requested
      it and HW supports it.
      
      PCI write end padding modification is not supported for now. There's no
      such flag for a QP but for a WQ, create and modify use the same flag.
      Return an error if PCI write end padding flag is set during modify_wq.
      Signed-off-by: default avatarNoa Osherovich <noaos@mellanox.com>
      Reviewed-by: default avatarMajd Dibbiny <majd@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      b1383aa6
    • Noa Osherovich's avatar
      IB/core: Add PCI write end padding flags for WQ and QP · e1d2e887
      Noa Osherovich authored
      There are root complexes that are able to optimize their
      performance when incoming data is multiple full cache lines.
      
      PCI write end padding is the device's ability to pad the ending of
      incoming packets (scatter) to full cache line such that the last
      upstream write generated by an incoming packet will be a full cache
      line.
      
      Add a relevant entry to ib_device_cap_flags to report such capability
      of an RDMA device.
      
      Add the QP and WQ create flags:
       * A QP/WQ created with a scatter end padding flag will cause
         HW to pad the last upstream write generated by a packet to cache line.
      
      User should consider several factors before activating this feature:
      - In case of high CPU memory load (which may cause PCI back pressure in
        turn), if a large percent of the writes are partial cache line, this
        feature should be checked as an optional solution.
      - This feature might reduce performance if most packets are between one
        and two cache lines and PCIe throughput has reached its maximum
        capacity. E.g. 65B packet from the network port will lead to 128B
        write on PCIe, which may cause traffic on PCIe to reach high
        throughput.
      Signed-off-by: default avatarNoa Osherovich <noaos@mellanox.com>
      Reviewed-by: default avatarMajd Dibbiny <majd@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      e1d2e887
    • Thomas Bogendoerfer's avatar
      IB/rxe: don't crash, if allocation of crc algorithm failed · 3192c53e
      Thomas Bogendoerfer authored
      Following crash happens, if crc algorithm couldn't be allocated:
      
      [ 1087.989072] rdma_rxe: loaded
      [ 1097.855397] PCLMULQDQ-NI instructions are not detected.
      [ 1097.901220] rdma_rxe: failed to allocate crc algorithmi err:-2
      [ 1097.901248] BUG: unable to handle kernel
      [ 1097.901249] NULL pointer dereference
      [ 1097.901250]  at 0000000000000046
      [...]
      
      Reason is that rxe->tfm is assigned the error return, which will then
      be used for crypto_free_shash() in rxe_cleanup. Fix by using a
      temporary variable and assigning it rxe->tfm after allocation succeeded.
      
      Fixes: cee2688e ("IB/rxe: Offload CRC calculation when possible")
      Signed-off-by: default avatarThomas Bogendoerfer <tbogendoerfer@suse.de>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Acked-by: default avatarMoni Shoua <monis@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      3192c53e
    • Parav Pandit's avatar
      IB/core: Avoid crash on pkey enforcement failed in received MADs · 89548bca
      Parav Pandit authored
      Below kernel crash is observed when Pkey security enforcement fails on
      received MADs. This issue is reported in [1].
      
      ib_free_recv_mad() accesses the rmpp_list, whose initialization is
      needed before accessing it.
      When security enformcent fails on received MADs, MAD processing avoided
      due to security checks failed.
      
      OpenSM[3770]: SM port is down
      kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      kernel: IP: ib_free_recv_mad+0x44/0xa0 [ib_core]
      kernel: PGD 0
      kernel: P4D 0
      kernel:
      kernel: Oops: 0002 [#1] SMP
      kernel: CPU: 0 PID: 2833 Comm: kworker/0:1H Tainted: P          IO    4.13.4-1-pve #1
      kernel: Hardware name: Dell       XS23-TY3        /9CMP63, BIOS 1.71 09/17/2013
      kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
      kernel: task: ffffa069c6541600 task.stack: ffffb9a729054000
      kernel: RIP: 0010:ib_free_recv_mad+0x44/0xa0 [ib_core]
      kernel: RSP: 0018:ffffb9a729057d38 EFLAGS: 00010286
      kernel: RAX: ffffa069cb138a48 RBX: ffffa069cb138a10 RCX: 0000000000000000
      kernel: RDX: ffffb9a729057d38 RSI: 0000000000000000 RDI: ffffa069cb138a20
      kernel: RBP: ffffb9a729057d60 R08: ffffa072d2d49800 R09: ffffa069cb138ae0
      kernel: R10: ffffa069cb138ae0 R11: ffffa072b3994e00 R12: ffffb9a729057d38
      kernel: R13: ffffa069d1c90000 R14: 0000000000000000 R15: ffffa069d1c90880
      kernel: FS:  0000000000000000(0000) GS:ffffa069dba00000(0000) knlGS:0000000000000000
      kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      kernel: CR2: 0000000000000008 CR3: 00000011f51f2000 CR4: 00000000000006f0
      kernel: Call Trace:
      kernel:  ib_mad_recv_done+0x5cc/0xb50 [ib_core]
      kernel:  __ib_process_cq+0x5c/0xb0 [ib_core]
      kernel:  ib_cq_poll_work+0x20/0x60 [ib_core]
      kernel:  process_one_work+0x1e9/0x410
      kernel:  worker_thread+0x4b/0x410
      kernel:  kthread+0x109/0x140
      kernel:  ? process_one_work+0x410/0x410
      kernel:  ? kthread_create_on_node+0x70/0x70
      kernel:  ? SyS_exit_group+0x14/0x20
      kernel:  ret_from_fork+0x25/0x30
      kernel: RIP: ib_free_recv_mad+0x44/0xa0 [ib_core] RSP: ffffb9a729057d38
      kernel: CR2: 0000000000000008
      
      [1] : https://www.spinics.net/lists/linux-rdma/msg56190.html
      
      Fixes: 47a2b338 ("IB/core: Enforce security on management datagrams")
      Cc: stable@vger.kernel.org # 4.13+
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Reported-by: default avatarChris Blake <chrisrblake93@gmail.com>
      Reviewed-by: default avatarDaniel Jurgens <danielj@mellanox.com>
      Reviewed-by: default avatarHal Rosenstock <hal@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      89548bca
    • Leon Romanovsky's avatar
      RDMA/cxgb4: Annotate r2 and stag as __be32 · 7d7d065a
      Leon Romanovsky authored
      Chelsio cxgb4 HW is big-endian, hence there is need to properly
      annotate r2 and stag fields as __be32 and not __u32 to fix the
      following sparse warnings.
      
        drivers/infiniband/hw/cxgb4/qp.c:614:16:
          warning: incorrect type in assignment (different base types)
            expected unsigned int [unsigned] [usertype] r2
            got restricted __be32 [usertype] <noident>
        drivers/infiniband/hw/cxgb4/qp.c:615:18:
          warning: incorrect type in assignment (different base types)
            expected unsigned int [unsigned] [usertype] stag
            got restricted __be32 [usertype] <noident>
      
      Cc: Steve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Reviewed-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      7d7d065a
    • Guy Levi's avatar
      IB/mlx4: Fix RSS's QPC attributes assignments · 108809a0
      Guy Levi authored
      In the modify QP handler the base_qpn_udp field in the RSS QPC is
      overwrite later by irrelevant value assignment. Hence, ingress packets
      which gets to the RSS QP will be steered then to a garbage QPN.
      
      The patch fixes this by skipping the above assignment when a RSS QP is
      modified, also, the RSS context's attributes assignments are relocated
      just before the context is posted to avoid future issues like this.
      
      Additionally, this patch takes the opportunity to change the code to be
      disciplined to the device's manual and assigns the RSS QP context just at
      RESET to INIT transition.
      
      Fixes:3078f5f1 ("IB/mlx4: Add support for RSS QP")
      Signed-off-by: default avatarGuy Levi <guyle@mellanox.com>
      Reviewed-by: default avatarYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      108809a0
    • Guy Levi's avatar
      IB/mlx4: Add report for RSS capabilities by vendor channel · 09d208b2
      Guy Levi authored
      The mlx4's RSS patches submission missed a report of RSS capabilities
      which should be reported by the vendor channel in query_device.
      Signed-off-by: default avatarGuy Levi <guyle@mellanox.com>
      Reviewed-by: default avatarYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      09d208b2
    • Leon Romanovsky's avatar
      RDMA/umem: Avoid partial declaration of non-static function · fec99ede
      Leon Romanovsky authored
      The RDMA/umem uses generic RB-trees macros to generate various ib_umem
      access functions. The generation is performed with INTERVAL_TREE_DEFINE
      macro, which allows one of two modes: declare all functions as static or
      declare none of the function to be static.
      
      The second mode of operation produces the following sparse errors:
       drivers/infiniband/core/umem_rbtree.c:69:1:
      	warning: symbol 'rbt_ib_umem_iter_first' was not declared.
      	Should it be static?
       drivers/infiniband/core/umem_rbtree.c:69:1:
      	warning: symbol 'rbt_ib_umem_iter_next' was not declared.
      	Should it be static?
      
      Code relocation together with declaration of such functions to be
      "static" solves the issue.
      
      Because there is no need to have separate file for two functions,
      let's consolidate umem_rtree.c and umem_odp.c into one file.
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      fec99ede
    • oulijun's avatar
      RDMA/hns: Modify the usage of cmd_sn in hip08 · 26beb85f
      oulijun authored
      The cmd_sn field of CQ doorbell inits for 0. It should be
      increment on each first db rung after a completion Event.
      if the cmd_sn of notify doorbell Adjacent two times is the
      same, the hardware will distinguish it for the same notify
      request and update its type according to the priority level
      of next event and solicited event.
      Signed-off-by: default avatarLijun Ou <oulijun@huawei.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      26beb85f
    • oulijun's avatar
      RDMA/hns: Unify the calculation for hem index in hip08 · 0203b14c
      oulijun authored
      The calculation of hem index are different between hns_roce_table_get
      and hns_roce_table_find. When the table chunk size of TRRL is not
      divisible by object size, it will faile to find the trrl table.
      
      This patch is to update the calculation of the hem index in the
      hns_roce_table_find to the same as which in the hns_roce_table_get.
      Signed-off-by: default avatarShaobo Xu <xushaobo2@huawei.com>
      Signed-off-by: default avatarLijun Ou <oulijun@huawei.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      0203b14c
    • oulijun's avatar
      RDMA/hns: Set the owner field of SQWQE in hip08 RoCE · e8d18533
      oulijun authored
      the owner need to be set when posting sqwqe in hip08 RoCE.
      The owner be used according to the below algorithm:
      The value of owner should be 1 in the first lap, it
      should be 0 in the second lap and in turn.
      Signed-off-by: default avatarLijun Ou <oulijun@huawei.com>
      Signed-off-by: default avatarWei Hu (Xavier) <xavier.huwei@huawei.com>
      Signed-off-by: default avatarShaobo Xu <xushaobo2@huawei.com>
      Signed-off-by: default avatarYixian Liu <liuyixian@huawei.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      e8d18533