1. 04 Jun, 2021 23 commits
  2. 03 Jun, 2021 17 commits
    • Colin Ian King's avatar
      netdevsim: Fix unsigned being compared to less than zero · ebbf5fcb
      Colin Ian King authored
      The comparison of len < 0 is always false because len is a size_t. Fix
      this by making len a ssize_t instead.
      
      Addresses-Coverity: ("Unsigned compared against 0")
      Fixes: d3953819 ("netdevsim: Add max_vfs to bus_dev")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ebbf5fcb
    • Andreas Roeseler's avatar
      icmp: fix lib conflict with trinity · e32ea44c
      Andreas Roeseler authored
      Including <linux/in.h> and <netinet/in.h> in the dependencies breaks
      compilation of trinity due to multiple definitions. <linux/in.h> is only
      used in <linux/icmp.h> to provide the definition of the struct in_addr,
      but this can be substituted out by using the datatype __be32.
      Signed-off-by: default avatarAndreas Roeseler <andreas.a.roeseler@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e32ea44c
    • Nathan Chancellor's avatar
      net: ethernet: rmnet: Restructure if checks to avoid uninitialized warning · 118de610
      Nathan Chancellor authored
      Clang warns that proto in rmnet_map_v5_checksum_uplink_packet() might be
      used uninitialized:
      
      drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c:283:14: warning:
      variable 'proto' is used uninitialized whenever 'if' condition is false
      [-Wsometimes-uninitialized]
                      } else if (skb->protocol == htons(ETH_P_IPV6)) {
                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c:295:36: note:
      uninitialized use occurs here
                      check = rmnet_map_get_csum_field(proto, trans);
                                                       ^~~~~
      drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c:283:10: note:
      remove the 'if' if its condition is always true
                      } else if (skb->protocol == htons(ETH_P_IPV6)) {
                             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c:270:11: note:
      initialize the variable 'proto' to silence this warning
                      u8 proto;
                              ^
                               = '\0'
      1 warning generated.
      
      This is technically a false positive because there is an if statement
      above this one that checks skb->protocol for not being either
      ETH_P_IP{,V6}. However, it is more obvious to sink that into the if
      statement as an else branch, which makes the code clearer and fixes the
      warning.
      
      At the same time, move the "IS_ENABLED(CONFIG_IPV6)" into the else if
      condition so that the else branch of the preprocessor conditional can
      be shared, since there is no build failure with CONFIG_IPV6 disabled.
      
      Fixes: b6e5d27e ("net: ethernet: rmnet: Add support for MAPv5 egress packets")
      Link: https://github.com/ClangBuiltLinux/linux/issues/1390Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      118de610
    • Nathan Chancellor's avatar
      net: ks8851: Make ks8851_read_selftest() return void · 819fb78f
      Nathan Chancellor authored
      clang points out that ret in ks8851_read_selftest() is set but unused:
      
      drivers/net/ethernet/micrel/ks8851_common.c:1028:6: warning: variable
      'ret' set but not used [-Wunused-but-set-variable]
              int ret = 0;
                  ^
      1 warning generated.
      
      The return code of this function has never been checked so just remove
      ret and make the function return void.
      
      Fixes: 3ba81f3e ("net: Micrel KS8851 SPI network driver")
      Suggested-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      819fb78f
    • Yu Kuai's avatar
      sch_htb: fix doc warning in htb_add_to_id_tree() · a10541f5
      Yu Kuai authored
      Add description for parameters of htb_add_to_id_tree() to fix
      gcc W=1 warnings:
      net/sched/sch_htb.c:282: warning: Function parameter or member 'root' not described in 'htb_add_to_id_tree'
      net/sched/sch_htb.c:282: warning: Function parameter or member 'cl' not described in 'htb_add_to_id_tree'
      net/sched/sch_htb.c:282: warning: Function parameter or member 'prio' not described in 'htb_add_to_id_tree'
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a10541f5
    • Colin Ian King's avatar
      bonding: remove redundant initialization of variable ret · 92e1b57c
      Colin Ian King authored
      The variable ret is being initialized with a value that is never read,
      it is being updated later on.  The assignment is redundant and can be
      removed.
      
      Addresses-Coverity: ("Unused value")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      92e1b57c
    • Russell King's avatar
      net: phy: marvell: use phy_modify_changed() for marvell_set_polarity() · feb938fa
      Russell King authored
      Rather than open-coding the phy_modify_changed() sequence, use this
      helper in marvell_set_polarity().
      Signed-off-by: default avatarRussell King <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarMarek Behún <kabel@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      feb938fa
    • David S. Miller's avatar
      Merge branch 'ipa-inline-csum' · e5118f57
      David S. Miller authored
      Alex Elder says:
      
      ====================
      net: ipa: support inline checksum offload
      
      Inline offload--required for checksum offload support on IPA version
      4.5 and above--is now supported by the RMNet driver:
        https://lore.kernel.org/netdev/162259440606.2786.10278242816453240434.git-patchwork-notify@kernel.org/
      
      Add support for it in the IPA driver, and revert the commit that
      disabled it pending acceptance of the RMNet code.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5118f57
    • Alex Elder's avatar
      Revert "net: ipa: disable checksum offload for IPA v4.5+" · d15ec193
      Alex Elder authored
      This reverts commit c88c34fc.
      
      The RMNet driver now supports inline checksum offload.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d15ec193
    • Alex Elder's avatar
      net: ipa: add support for inline checksum offload · 5567d4d9
      Alex Elder authored
      Starting with IPA v4.5, IP payload checksum offload is implemented
      differently.
      
      Prior to v4.5, the IPA hardware appends an rmnet_map_dl_csum_trailer
      structure to each packet if checksum offload is enabled in the
      download direction (modem->AP).  In the upload direction (AP->modem)
      a rmnet_map_ul_csum_header structure is prepended before each sent
      packet.
      
      Starting with IPA v4.5, checksum offload is implemented using a
      single new rmnet_map_v5_csum_header structure which sits between
      the QMAP header and the packet data.  The same header structure
      is used in both directions.
      
      The new header contains a header type (CSUM_OFFLOAD); a checksum
      flag; and a flag indicating whether any other headers follow this
      one.  The checksum flag indicates whether the hardware should
      compute (and insert) the checksum on a sent packet.  On a received
      packet the checksum flag indicates whether the hardware confirms the
      checksum value in the payload is correct.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5567d4d9
    • David S. Miller's avatar
      Merge tag 'mlx5-updates-2021-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · fcd1a530
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      This series provides misc updates for mlx5 drivers.
      For more information please see tag log below.
      
      Please pull and let me know if there is any problem.
      
      mlx5-updates-2021-06-03
      
      This series contains misc updates for mlx5 driver
      
      1) Alaa disables advanced features when kdump mode to save on memory
      2) Jakub counts all link flap events
      3) Meir adds support for IPoIB NDR speed
      4) Various misc cleanup
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fcd1a530
    • Íñigo Huguet's avatar
      net:cxgb3: fix code style issues · 6a8dd8b2
      Íñigo Huguet authored
      Signed-off-by: default avatarÍñigo Huguet <ihuguet@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a8dd8b2
    • Íñigo Huguet's avatar
      net:cxgb3: replace tasklets with works · 5e0b8928
      Íñigo Huguet authored
      OFLD and CTRL TX queues can be stopped if there is no room in
      their DMA rings. If this happens, they're tried to be restarted
      later after having made some room in the corresponding ring.
      
      The tasks of restarting these queues were triggered using
      tasklets, but they can be replaced for workqueue works, getting
      them out of softirq context.
      
      This queues stop/restart probably doesn't happen often and they
      can be quite lengthy because they try to send all pending skbs.
      Moreover, given that probably the ring is not empty yet, so the
      DMA still has work to do, we don't need to be so fast to justify
      using tasklets/softirq instead of running in a thread.
      Signed-off-by: default avatarÍñigo Huguet <ihuguet@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e0b8928
    • Yuchung Cheng's avatar
      net: tcp better handling of reordering then loss cases · a29cb691
      Yuchung Cheng authored
      This patch aims to improve the situation when reordering and loss are
      ocurring in the same flight of packets.
      
      Previously the reordering would first induce a spurious recovery, then
      the subsequent ACK may undo the cwnd (based on the timestamps e.g.).
      However the current loss recovery does not proceed to invoke
      RACK to install a reordering timer. If some packets are also lost, this
      may lead to a long RTO-based recovery. An example is
      https://groups.google.com/g/bbr-dev/c/OFHADvJbTEI
      
      The solution is to after reverting the recovery, always invoke RACK
      to either mount the RACK timer to fast retransmit after the reordering
      window, or restarts the recovery if new loss is identified. Hence
      it is possible the sender may go from Recovery to Disorder/Open to
      Recovery again in one ACK.
      Reported-by: default avatarmingkun bian <bianmingkun@gmail.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a29cb691
    • Kees Cook's avatar
      net: bonding: Use strscpy_pad() instead of manually-truncated strncpy() · 43902070
      Kees Cook authored
      Silence this warning by using strscpy_pad() directly:
      
      drivers/net/bonding/bond_main.c:4877:3: warning: 'strncpy' specified bound 16 equals destination size [-Wstringop-truncation]
          4877 |   strncpy(params->primary, primary, IFNAMSIZ);
               |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      Additionally replace other strncpy() uses, as it is considered deprecated:
      https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-stringsReported-by: default avatarkernel test robot <lkp@intel.com>
      Link: https://lore.kernel.org/lkml/202102150705.fdR6obB0-lkp@intel.comAcked-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      43902070
    • Kees Cook's avatar
      net: vlan: Avoid using strncpy() · 9c153d38
      Kees Cook authored
      Use strscpy_pad() instead of strncpy() which is considered deprecated:
      https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-stringsSigned-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9c153d38
    • David S. Miller's avatar
      Merge branch 'NVMeTCP-Offload-ULP' · 5ff5622e
      David S. Miller authored
      Shai Malin says:
      
      ====================
      NVMeTCP Offload ULP
      
      With the goal of enabling a generic infrastructure that allows NVMe/TCP
      offload devices like NICs to seamlessly plug into the NVMe-oF stack, this
      patch series introduces the nvme-tcp-offload ULP host layer, which will
      be a new transport type called "tcp-offload" and will serve as an
      abstraction layer to work with vendor specific nvme-tcp offload drivers.
      
      NVMeTCP offload is a full offload of the NVMeTCP protocol, this includes
      both the TCP level and the NVMeTCP level.
      
      The nvme-tcp-offload transport can co-exist with the existing tcp and
      other transports. The tcp offload was designed so that stack changes are
      kept to a bare minimum: only registering new transports.
      All other APIs, ops etc. are identical to the regular tcp transport.
      Representing the TCP offload as a new transport allows clear and manageable
      differentiation between the connections which should use the offload path
      and those that are not offloaded (even on the same device).
      
      The nvme-tcp-offload layers and API compared to nvme-tcp and nvme-rdma:
      
      * NVMe layer: *
      
             [ nvme/nvme-fabrics/blk-mq ]
                   |
              (nvme API and blk-mq API)
                   |
                   |
      * Vendor agnostic transport layer: *
      
            [ nvme-rdma ] [ nvme-tcp ] [ nvme-tcp-offload ]
                   |        |             |
                 (Verbs)
                   |        |             |
                   |     (Socket)
                   |        |             |
                   |        |        (nvme-tcp-offload API)
                   |        |             |
                   |        |             |
      * Vendor Specific Driver: *
      
                   |        |             |
                 [ qedr ]
                            |             |
                         [ qede ]
                                          |
                                        [ qedn ]
      
      Performance:
      ============
      With this implementation on top of the Marvell qedn driver (using the
      Marvell FastLinQ NIC), we were able to demonstrate the following CPU
      utilization improvement:
      
      On AMD EPYC 7402, 2.80GHz, 28 cores:
      - For 16K queued read IOs, 16jobs, 4qd (50Gbps line rate):
        Improved the CPU utilization from 15.1% with NVMeTCP SW to 4.7% with
        NVMeTCP offload.
      
      On Intel(R) Xeon(R) Gold 5122 CPU, 3.60GHz, 16 cores:
      - For 512K queued read IOs, 16jobs, 4qd (25Gbps line rate):
        Improved the CPU utilization from 16.3% with NVMeTCP SW to 1.1% with
        NVMeTCP offload.
      
      In addition, we were able to demonstrate the following latency improvement:
      - For 200K read IOPS (16 jobs, 16 qd, with fio rate limiter):
        Improved the average latency from 105 usec with NVMeTCP SW to 39 usec
        with NVMeTCP offload.
      
        Improved the 99.99 tail latency from 570 usec with NVMeTCP SW to 91 usec
        with NVMeTCP offload.
      
      The end-to-end offload latency was measured from fio while running against
      back end of null device.
      
      Upstream plan:
      ==============
      The RFC series "NVMeTCP Offload ULP and QEDN Device Driver"
      https://lore.kernel.org/netdev/20210531225222.16992-1-smalin@marvell.com/
      was designed in a modular way so that part 1 (nvme-tcp-offload) and
      part 2 (qed) are independent and part 3 (qedn) depends on both parts 1+2.
      
      - Part 1 (RFC patch 1-8): NVMeTCP Offload ULP
        The nvme-tcp-offload patches, will be sent to
        'linux-nvme@lists.infradead.org'.
      
      - Part 2 (RFC patches 9-15): QED NVMeTCP Offload
        The qed infrastructure, will be sent to 'netdev@vger.kernel.org'.
      
      Once part 1 and 2 are accepted:
      
      - Part 3 (RFC patches 16-27): QEDN NVMeTCP Offload
        The qedn patches, will be sent to 'linux-nvme@lists.infradead.org'.
      
      Marvell is fully committed to maintain, test, and address issues with
      the new nvme-tcp-offload layer.
      
      Usage:
      ======
      With the Marvell NVMeTCP offload design, the network-device (qede) and the
      offload-device (qedn) are paired on each port - Logically similar to the
      RDMA model.
      The user will interact with the network-device in order to configure
      the ip/vlan. The NVMeTCP configuration is populated as part of the
      nvme connect command.
      
      Example:
      Assign IP to the net-device (from any existing Linux tool):
      
          ip addr add 100.100.0.101/24 dev p1p1
      
      This IP will be used by both net-device (qede) and offload-device (qedn).
      
      In order to connect from "sw" nvme-tcp through the net-device (qede):
      
          nvme connect -t tcp -s 4420 -a 100.100.0.100 -n testnqn
      
      In order to connect from "offload" nvme-tcp through the offload-device (qedn):
      
          nvme connect -t tcp_offload -s 4420 -a 100.100.0.100 -n testnqn
      
      An alternative approach, and as a future enhancement that will not impact this
      series will be to modify nvme-cli with a new flag that will determine
      if "-t tcp" should be the regular nvme-tcp (which will be the default)
      or nvme-tcp-offload.
      Exmaple:
          nvme connect -t tcp -s 4420 -a 100.100.0.100 -n testnqn -[new flag]
      
      Queue Initialization Design:
      ============================
      The nvme-tcp-offload ULP module shall register with the existing
      nvmf_transport_ops (.name = "tcp_offload"), nvme_ctrl_ops and blk_mq_ops.
      The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
      with the following ops:
      - claim_dev() - in order to resolve the route to the target according to
                      the paired net_dev.
      - create_queue() - in order to create offloaded nvme-tcp queue.
      
      The nvme-tcp-offload ULP module shall manage all the controller level
      functionalities, call claim_dev and based on the return values shall call
      the relevant module create_queue in order to create the admin queue and
      the IO queues.
      
      IO-path Design:
      ===============
      The nvme-tcp-offload shall work at the IO-level - the nvme-tcp-offload
      ULP module shall pass the request (the IO) to the nvme-tcp-offload vendor
      driver and later, the nvme-tcp-offload vendor driver returns the request
      completion (the IO completion).
      No additional handling is needed in between; this design will reduce the
      CPU utilization as we will describe below.
      
      The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
      with the following IO-path ops:
      - send_req() - in order to pass the request to the handling of the
                     offload driver that shall pass it to the vendor specific device.
      - poll_queue()
      
      Once the IO completes, the nvme-tcp-offload vendor driver shall call
      command.done() that will invoke the nvme-tcp-offload ULP layer to
      complete the request.
      
      TCP events:
      ===========
      The Marvell FastLinQ NIC HW engine handle all the TCP re-transmissions
      and OOO events.
      
      Teardown and errors:
      ====================
      In case of NVMeTCP queue error the nvme-tcp-offload vendor driver shall
      call the nvme_tcp_ofld_report_queue_err.
      The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
      with the following teardown ops:
      - drain_queue()
      - destroy_queue()
      
      The Marvell FastLinQ NIC HW engine:
      ====================================
      The Marvell NIC HW engine is capable of offloading the entire TCP/IP
      stack and managing up to 64K connections per PF, already implemented and
      upstream use cases for this include iWARP (by the Marvell qedr driver)
      and iSCSI (by the Marvell qedi driver).
      In addition, the Marvell NIC HW engine offloads the NVMeTCP queue layer
      and is able to manage the IO level also in case of TCP re-transmissions
      and OOO events.
      The HW engine enables direct data placement (including the data digest CRC
      calculation and validation) and direct data transmission (including data
      digest CRC calculation).
      
      The Marvell qedn driver:
      ========================
      The new driver will be added under "drivers/nvme/hw" and will be enabled
      by the Kconfig "Marvell NVM Express over Fabrics TCP offload".
      As part of the qedn init, the driver will register as a pci device driver
      and will work with the Marvell fastlinQ NIC.
      As part of the probe, the driver will register to the nvme_tcp_offload
      (ULP) and to the qed module (qed_nvmetcp_ops) - similar to other
      "qed_*_ops" which are used by the qede, qedr, qedf and qedi device
      drivers.
      
      nvme-tcp-offload Future work:
      =============================
      - NVMF_OPT_HOST_IFACE Support.
      
      Changes since RFC v1:
      =====================
      - nvme-tcp-offload: Fix nvme_tcp_ofld_ops return values.
      - nvme-tcp-offload: Remove NVMF_TRTYPE_TCP_OFFLOAD.
      - nvme-tcp-offload: Add nvme_tcp_ofld_poll() implementation.
      - nvme-tcp-offload: Fix nvme_tcp_ofld_queue_rq() to check map_sg() and
        send_req() return values.
      
      Changes since RFC v2:
      =====================
      - nvme-tcp-offload: Fixes in controller and queue level (patches 3-6).
      - qedn: Add the Marvell's NVMeTCP HW offload vendor driver init and probe
        (patches 8-11).
      
      Changes since RFC v3:
      =====================
      - nvme-tcp-offload: Add the full implementation of the nvme-tcp-offload layer
        including the new ops: setup_ctrl(), release_ctrl(), commit_rqs() and new
        flows (ASYNC and timeout).
      - nvme-tcp-offload: Add device maximums: max_hw_sectors, max_segments.
      - nvme-tcp-offload: layer design and optimization changes.
      
      Changes since RFC v4:
      =====================
      (Many thanks to Hannes Reinecke for his feedback)
      - nvme_tcp_offload: Add num_hw_vectors in order to limit the number of queues.
      - nvme_tcp_offload: Add per device private_data.
      - nvme_tcp_offload: Fix header digest, data digest and tos initialization.
      
      Changes since RFC v5:
      =====================
      (Many thanks to Sagi Grimberg for his feedback)
      - nvme-fabrics: Expose nvmf_check_required_opts() globally (as a new patch).
      - nvme_tcp_offload: Remove io-queues BLK_MQ_F_BLOCKING.
      - nvme_tcp_offload: Fix the nvme_tcp_ofld_stop_queue (drain_queue) flow.
      - nvme_tcp_offload: Fix the nvme_tcp_ofld_free_queue (destroy_queue) flow.
      - nvme_tcp_offload: Change rwsem to mutex.
      - nvme_tcp_offload: remove redundant fields.
      - nvme_tcp_offload: Remove the "new" from setup_ctrl().
      - nvme_tcp_offload: Remove the init_req() and commit_rqs() ops.
      - nvme_tcp_offload: Minor fixes in nvme_tcp_ofld_create_ctrl() ansd
        nvme_tcp_ofld_free_queue().
      - nvme_tcp_offload: Patch 8 (timeout and async) was squeashed into
        patch 7 (io level).
      
      Changes since RFC v6:
      =====================
      - No changes in nvme_tcp_offload (only in qedn).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5ff5622e