1. 02 Oct, 2020 11 commits
    • Aya Levin's avatar
      net/mlx5e: Fix return status when setting unsupported FEC mode · 2608a2f8
      Aya Levin authored
      Verify the configured FEC mode is supported by at least a single link
      mode before applying the command. Otherwise fail the command and return
      "Operation not supported".
      Prior to this patch, the command was successful, yet it falsely set all
      link modes to FEC auto mode - like configuring FEC mode to auto. Auto
      mode is the default configuration if a link mode doesn't support the
      configured FEC mode.
      
      Fixes: b5ede32d ("net/mlx5e: Add support for FEC modes based on 50G per lane links")
      Signed-off-by: default avatarAya Levin <ayal@mellanox.com>
      Reviewed-by: default avatarEran Ben Elisha <eranbe@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      2608a2f8
    • Aya Levin's avatar
      net/mlx5e: Fix driver's declaration to support GRE offload · 3d093bc2
      Aya Levin authored
      Declare GRE offload support with respect to the inner protocol. Add a
      list of supported inner protocols on which the driver can offload
      checksum and GSO. For other protocols, inform the stack to do the needed
      operations. There is no noticeable impact on GRE performance.
      
      Fixes: 27299841 ("net/mlx5e: Support TSO and TX checksum offloads for GRE tunnels")
      Signed-off-by: default avatarAya Levin <ayal@mellanox.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      3d093bc2
    • Maor Dickman's avatar
      net/mlx5e: CT, Fix coverity issue · 2b021989
      Maor Dickman authored
      The cited commit introduced the following coverity issue at function
      mlx5_tc_ct_rule_to_tuple_nat:
      - Memory - corruptions (OVERRUN)
        Overrunning array "tuple->ip.src_v6.in6_u.u6_addr32" of 4 4-byte
        elements at element index 7 (byte offset 31) using index
        "ip6_offset" (which evaluates to 7).
      
      In case of IPv6 destination address rewrite, ip6_offset values are
      between 4 to 7, which will cause memory overrun of array
      "tuple->ip.src_v6.in6_u.u6_addr32" to array
      "tuple->ip.dst_v6.in6_u.u6_addr32".
      
      Fixed by writing the value directly to array
      "tuple->ip.dst_v6.in6_u.u6_addr32" in case ip6_offset values are
      between 4 to 7.
      
      Fixes: bc562be9 ("net/mlx5e: CT: Save ct entries tuples in hashtables")
      Signed-off-by: default avatarMaor Dickman <maord@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      2b021989
    • Aya Levin's avatar
      net/mlx5e: Add resiliency in Striding RQ mode for packets larger than MTU · c3c94023
      Aya Levin authored
      Prior to this fix, in Striding RQ mode the driver was vulnerable when
      receiving packets in the range (stride size - headroom, stride size].
      Where stride size is calculated by mtu+headroom+tailroom aligned to the
      closest power of 2.
      Usually, this filtering is performed by the HW, except for a few cases:
      - Between 2 VFs over the same PF with different MTUs
      - On bluefield, when the host physical function sets a larger MTU than
        the ARM has configured on its representor and uplink representor.
      
      When the HW filtering is not present, packets that are larger than MTU
      might be harmful for the RQ's integrity, in the following impacts:
      1) Overflow from one WQE to the next, causing a memory corruption that
      in most cases is unharmful: as the write happens to the headroom of next
      packet, which will be overwritten by build_skb(). In very rare cases,
      high stress/load, this is harmful. When the next WQE is not yet reposted
      and points to existing SKB head.
      2) Each oversize packet overflows to the headroom of the next WQE. On
      the last WQE of the WQ, where addresses wrap-around, the address of the
      remainder headroom does not belong to the next WQE, but it is out of the
      memory region range. This results in a HW CQE error that moves the RQ
      into an error state.
      
      Solution:
      Add a page buffer at the end of each WQE to absorb the leak. Actually
      the maximal overflow size is headroom but since all memory units must be
      of the same size, we use page size to comply with UMR WQEs. The increase
      in memory consumption is of a single page per RQ. Initialize the mkey
      with all MTTs pointing to a default page. When the channels are
      activated, UMR WQEs will redirect the RX WQEs to the actual memory from
      the RQ's pool, while the overflow MTTs remain mapped to the default page.
      
      Fixes: 73281b78 ("net/mlx5e: Derive Striding RQ size from MTU")
      Signed-off-by: default avatarAya Levin <ayal@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      c3c94023
    • Aya Levin's avatar
      net/mlx5e: Fix error path for RQ alloc · 08a762ce
      Aya Levin authored
      Increase granularity of the error path to avoid unneeded free/release.
      Fix the cleanup to be symmetric to the order of creation.
      
      Fixes: 0ddf5432 ("xdp/mlx5: setup xdp_rxq_info")
      Fixes: 422d4c40 ("net/mlx5e: RX, Split WQ objects for different RQ types")
      Signed-off-by: default avatarAya Levin <ayal@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      08a762ce
    • Maor Gottlieb's avatar
      net/mlx5: Fix request_irqs error flow · 732ebfab
      Maor Gottlieb authored
      Fix error flow handling in request_irqs which try to free irq
      that we failed to request.
      It fixes the below trace.
      
      WARNING: CPU: 1 PID: 7587 at kernel/irq/manage.c:1684 free_irq+0x4d/0x60
      CPU: 1 PID: 7587 Comm: bash Tainted: G        W  OE    4.15.15-1.el7MELLANOXsmp-x86_64 #1
      Hardware name: Advantech SKY-6200/SKY-6200, BIOS F2.00 08/06/2020
      RIP: 0010:free_irq+0x4d/0x60
      RSP: 0018:ffffc9000ef47af0 EFLAGS: 00010282
      RAX: ffff88001476ae00 RBX: 0000000000000655 RCX: 0000000000000000
      RDX: ffff88001476ae00 RSI: ffffc9000ef47ab8 RDI: ffff8800398bb478
      RBP: ffff88001476a838 R08: ffff88001476ae00 R09: 000000000000156d
      R10: 0000000000000000 R11: 0000000000000004 R12: ffff88001476a838
      R13: 0000000000000006 R14: ffff88001476a888 R15: 00000000ffffffe4
      FS:  00007efeadd32740(0000) GS:ffff88047fc40000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fc9cc010008 CR3: 00000001a2380004 CR4: 00000000007606e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       mlx5_irq_table_create+0x38d/0x400 [mlx5_core]
       ? atomic_notifier_chain_register+0x50/0x60
       mlx5_load_one+0x7ee/0x1130 [mlx5_core]
       init_one+0x4c9/0x650 [mlx5_core]
       pci_device_probe+0xb8/0x120
       driver_probe_device+0x2a1/0x470
       ? driver_allows_async_probing+0x30/0x30
       bus_for_each_drv+0x54/0x80
       __device_attach+0xa3/0x100
       pci_bus_add_device+0x4a/0x90
       pci_iov_add_virtfn+0x2dc/0x2f0
       pci_enable_sriov+0x32e/0x420
       mlx5_core_sriov_configure+0x61/0x1b0 [mlx5_core]
       ? kstrtoll+0x22/0x70
       num_vf_store+0x4b/0x70 [mlx5_core]
       kernfs_fop_write+0x102/0x180
       __vfs_write+0x26/0x140
       ? rcu_all_qs+0x5/0x80
       ? _cond_resched+0x15/0x30
       ? __sb_start_write+0x41/0x80
       vfs_write+0xad/0x1a0
       SyS_write+0x42/0x90
       do_syscall_64+0x60/0x110
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      Fixes: 24163189 ("net/mlx5: Separate IRQ request/free from EQ life cycle")
      Signed-off-by: default avatarMaor Gottlieb <maorg@nvidia.com>
      Reviewed-by: default avatarEran Ben Elisha <eranbe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      732ebfab
    • Saeed Mahameed's avatar
      net/mlx5: cmdif, Avoid skipping reclaim pages if FW is not accessible · b898ce7b
      Saeed Mahameed authored
      In case of pci is offline reclaim_pages_cmd() will still try to call
      the FW to release FW pages, cmd_exec() in this case will return a silent
      success without actually calling the FW.
      
      This is wrong and will cause page leaks, what we should do is to detect
      pci offline or command interface un-available before tying to access the
      FW and manually release the FW pages in the driver.
      
      In this patch we share the code to check for FW command interface
      availability and we call it in sensitive places e.g. reclaim_pages_cmd().
      
      Alternative fix:
       1. Remove MLX5_CMD_OP_MANAGE_PAGES form mlx5_internal_err_ret_value,
          command success simulation list.
       2. Always Release FW pages even if cmd_exec fails in reclaim_pages_cmd().
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      b898ce7b
    • Eran Ben Elisha's avatar
      net/mlx5: Add retry mechanism to the command entry index allocation · 410bd754
      Eran Ben Elisha authored
      It is possible that new command entry index allocation will temporarily
      fail. The new command holds the semaphore, so it means that a free entry
      should be ready soon. Add one second retry mechanism before returning an
      error.
      
      Patch "net/mlx5: Avoid possible free of command entry while timeout comp
      handler" increase the possibility to bump into this temporarily failure
      as it delays the entry index release for non-callback commands.
      
      Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Signed-off-by: default avatarEran Ben Elisha <eranbe@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      410bd754
    • Eran Ben Elisha's avatar
      net/mlx5: poll cmd EQ in case of command timeout · 1d5558b1
      Eran Ben Elisha authored
      Once driver detects a command interface command timeout, it warns the
      user and returns timeout error to the caller. In such case, the entry of
      the command is not evacuated (because only real event interrupt is allowed
      to clear command interface entry). If the HW event interrupt
      of this entry will never arrive, this entry will be left unused forever.
      Command interface entries are limited and eventually we can end up without
      the ability to post a new command.
      
      In addition, if driver will not consume the EQE of the lost interrupt and
      rearm the EQ, no new interrupts will arrive for other commands.
      
      Add a resiliency mechanism for manually polling the command EQ in case of
      a command timeout. In case resiliency mechanism will find non-handled EQE,
      it will consume it, and the command interface will be fully functional
      again. Once the resiliency flow finished, wait another 5 seconds for the
      command interface to complete for this command entry.
      
      Define mlx5_cmd_eq_recover() to manage the cmd EQ polling resiliency flow.
      Add an async EQ spinlock to avoid races between resiliency flows and real
      interrupts that might run simultaneously.
      
      Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      1d5558b1
    • Eran Ben Elisha's avatar
      net/mlx5: Avoid possible free of command entry while timeout comp handler · 50b2412b
      Eran Ben Elisha authored
      Upon command completion timeout, driver simulates a forced command
      completion. In a rare case where real interrupt for that command arrives
      simultaneously, it might release the command entry while the forced
      handler might still access it.
      
      Fix that by adding an entry refcount, to track current amount of allowed
      handlers. Command entry to be released only when this refcount is
      decremented to zero.
      
      Command refcount is always initialized to one. For callback commands,
      command completion handler is the symmetric flow to decrement it. For
      non-callback commands, it is wait_func().
      
      Before ringing the doorbell, increment the refcount for the real completion
      handler. Once the real completion handler is called, it will decrement it.
      
      For callback commands, once the delayed work is scheduled, increment the
      refcount. Upon callback command completion handler, we will try to cancel
      the timeout callback. In case of success, we need to decrement the callback
      refcount as it will never run.
      
      In addition, gather the entry index free and the entry free into a one
      flow for all command types release.
      
      Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      50b2412b
    • Eran Ben Elisha's avatar
      net/mlx5: Fix a race when moving command interface to polling mode · 432161ea
      Eran Ben Elisha authored
      As part of driver unload, it destroys the commands EQ (via FW command).
      As the commands EQ is destroyed, FW will not generate EQEs for any command
      that driver sends afterwards. Driver should poll for later commands status.
      
      Driver commands mode metadata is updated before the commands EQ is
      actually destroyed. This can lead for double completion handle by the
      driver (polling and interrupt), if a command is executed and completed by
      FW after the mode was changed, but before the EQ was destroyed.
      
      Fix that by using the mlx5_cmd_allowed_opcode mechanism to guarantee
      that only DESTROY_EQ command can be executed during this time period.
      
      Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      432161ea
  2. 30 Sep, 2020 12 commits
  3. 29 Sep, 2020 16 commits
  4. 28 Sep, 2020 1 commit
    • Manivannan Sadhasivam's avatar
      net: qrtr: ns: Protect radix_tree_deref_slot() using rcu read locks · a7809ff9
      Manivannan Sadhasivam authored
      The rcu read locks are needed to avoid potential race condition while
      dereferencing radix tree from multiple threads. The issue was identified
      by syzbot. Below is the crash report:
      
      =============================
      WARNING: suspicious RCU usage
      5.7.0-syzkaller #0 Not tainted
      -----------------------------
      include/linux/radix-tree.h:176 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      2 locks held by kworker/u4:1/21:
       #0: ffff88821b097938 ((wq_completion)qrtr_ns_handler){+.+.}-{0:0}, at: spin_unlock_irq include/linux/spinlock.h:403 [inline]
       #0: ffff88821b097938 ((wq_completion)qrtr_ns_handler){+.+.}-{0:0}, at: process_one_work+0x6df/0xfd0 kernel/workqueue.c:2241
       #1: ffffc90000dd7d80 ((work_completion)(&qrtr_ns.work)){+.+.}-{0:0}, at: process_one_work+0x71e/0xfd0 kernel/workqueue.c:2243
      
      stack backtrace:
      CPU: 0 PID: 21 Comm: kworker/u4:1 Not tainted 5.7.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: qrtr_ns_handler qrtr_ns_worker
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1e9/0x30e lib/dump_stack.c:118
       radix_tree_deref_slot include/linux/radix-tree.h:176 [inline]
       ctrl_cmd_new_lookup net/qrtr/ns.c:558 [inline]
       qrtr_ns_worker+0x2aff/0x4500 net/qrtr/ns.c:674
       process_one_work+0x76e/0xfd0 kernel/workqueue.c:2268
       worker_thread+0xa7f/0x1450 kernel/workqueue.c:2414
       kthread+0x353/0x380 kernel/kthread.c:268
      
      Fixes: 0c2204a4 ("net: qrtr: Migrate nameservice to kernel from userspace")
      Reported-and-tested-by: syzbot+0f84f6eed90503da72fc@syzkaller.appspotmail.com
      Signed-off-by: default avatarManivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a7809ff9