1. 03 Jun, 2021 40 commits
    • Íñigo Huguet's avatar
      net:cxgb3: replace tasklets with works · 5e0b8928
      Íñigo Huguet authored
      OFLD and CTRL TX queues can be stopped if there is no room in
      their DMA rings. If this happens, they're tried to be restarted
      later after having made some room in the corresponding ring.
      
      The tasks of restarting these queues were triggered using
      tasklets, but they can be replaced for workqueue works, getting
      them out of softirq context.
      
      This queues stop/restart probably doesn't happen often and they
      can be quite lengthy because they try to send all pending skbs.
      Moreover, given that probably the ring is not empty yet, so the
      DMA still has work to do, we don't need to be so fast to justify
      using tasklets/softirq instead of running in a thread.
      Signed-off-by: default avatarÍñigo Huguet <ihuguet@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e0b8928
    • Yuchung Cheng's avatar
      net: tcp better handling of reordering then loss cases · a29cb691
      Yuchung Cheng authored
      This patch aims to improve the situation when reordering and loss are
      ocurring in the same flight of packets.
      
      Previously the reordering would first induce a spurious recovery, then
      the subsequent ACK may undo the cwnd (based on the timestamps e.g.).
      However the current loss recovery does not proceed to invoke
      RACK to install a reordering timer. If some packets are also lost, this
      may lead to a long RTO-based recovery. An example is
      https://groups.google.com/g/bbr-dev/c/OFHADvJbTEI
      
      The solution is to after reverting the recovery, always invoke RACK
      to either mount the RACK timer to fast retransmit after the reordering
      window, or restarts the recovery if new loss is identified. Hence
      it is possible the sender may go from Recovery to Disorder/Open to
      Recovery again in one ACK.
      Reported-by: default avatarmingkun bian <bianmingkun@gmail.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a29cb691
    • Kees Cook's avatar
      net: bonding: Use strscpy_pad() instead of manually-truncated strncpy() · 43902070
      Kees Cook authored
      Silence this warning by using strscpy_pad() directly:
      
      drivers/net/bonding/bond_main.c:4877:3: warning: 'strncpy' specified bound 16 equals destination size [-Wstringop-truncation]
          4877 |   strncpy(params->primary, primary, IFNAMSIZ);
               |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      Additionally replace other strncpy() uses, as it is considered deprecated:
      https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-stringsReported-by: default avatarkernel test robot <lkp@intel.com>
      Link: https://lore.kernel.org/lkml/202102150705.fdR6obB0-lkp@intel.comAcked-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      43902070
    • Kees Cook's avatar
      net: vlan: Avoid using strncpy() · 9c153d38
      Kees Cook authored
      Use strscpy_pad() instead of strncpy() which is considered deprecated:
      https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-stringsSigned-off-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9c153d38
    • David S. Miller's avatar
      Merge branch 'NVMeTCP-Offload-ULP' · 5ff5622e
      David S. Miller authored
      Shai Malin says:
      
      ====================
      NVMeTCP Offload ULP
      
      With the goal of enabling a generic infrastructure that allows NVMe/TCP
      offload devices like NICs to seamlessly plug into the NVMe-oF stack, this
      patch series introduces the nvme-tcp-offload ULP host layer, which will
      be a new transport type called "tcp-offload" and will serve as an
      abstraction layer to work with vendor specific nvme-tcp offload drivers.
      
      NVMeTCP offload is a full offload of the NVMeTCP protocol, this includes
      both the TCP level and the NVMeTCP level.
      
      The nvme-tcp-offload transport can co-exist with the existing tcp and
      other transports. The tcp offload was designed so that stack changes are
      kept to a bare minimum: only registering new transports.
      All other APIs, ops etc. are identical to the regular tcp transport.
      Representing the TCP offload as a new transport allows clear and manageable
      differentiation between the connections which should use the offload path
      and those that are not offloaded (even on the same device).
      
      The nvme-tcp-offload layers and API compared to nvme-tcp and nvme-rdma:
      
      * NVMe layer: *
      
             [ nvme/nvme-fabrics/blk-mq ]
                   |
              (nvme API and blk-mq API)
                   |
                   |
      * Vendor agnostic transport layer: *
      
            [ nvme-rdma ] [ nvme-tcp ] [ nvme-tcp-offload ]
                   |        |             |
                 (Verbs)
                   |        |             |
                   |     (Socket)
                   |        |             |
                   |        |        (nvme-tcp-offload API)
                   |        |             |
                   |        |             |
      * Vendor Specific Driver: *
      
                   |        |             |
                 [ qedr ]
                            |             |
                         [ qede ]
                                          |
                                        [ qedn ]
      
      Performance:
      ============
      With this implementation on top of the Marvell qedn driver (using the
      Marvell FastLinQ NIC), we were able to demonstrate the following CPU
      utilization improvement:
      
      On AMD EPYC 7402, 2.80GHz, 28 cores:
      - For 16K queued read IOs, 16jobs, 4qd (50Gbps line rate):
        Improved the CPU utilization from 15.1% with NVMeTCP SW to 4.7% with
        NVMeTCP offload.
      
      On Intel(R) Xeon(R) Gold 5122 CPU, 3.60GHz, 16 cores:
      - For 512K queued read IOs, 16jobs, 4qd (25Gbps line rate):
        Improved the CPU utilization from 16.3% with NVMeTCP SW to 1.1% with
        NVMeTCP offload.
      
      In addition, we were able to demonstrate the following latency improvement:
      - For 200K read IOPS (16 jobs, 16 qd, with fio rate limiter):
        Improved the average latency from 105 usec with NVMeTCP SW to 39 usec
        with NVMeTCP offload.
      
        Improved the 99.99 tail latency from 570 usec with NVMeTCP SW to 91 usec
        with NVMeTCP offload.
      
      The end-to-end offload latency was measured from fio while running against
      back end of null device.
      
      Upstream plan:
      ==============
      The RFC series "NVMeTCP Offload ULP and QEDN Device Driver"
      https://lore.kernel.org/netdev/20210531225222.16992-1-smalin@marvell.com/
      was designed in a modular way so that part 1 (nvme-tcp-offload) and
      part 2 (qed) are independent and part 3 (qedn) depends on both parts 1+2.
      
      - Part 1 (RFC patch 1-8): NVMeTCP Offload ULP
        The nvme-tcp-offload patches, will be sent to
        'linux-nvme@lists.infradead.org'.
      
      - Part 2 (RFC patches 9-15): QED NVMeTCP Offload
        The qed infrastructure, will be sent to 'netdev@vger.kernel.org'.
      
      Once part 1 and 2 are accepted:
      
      - Part 3 (RFC patches 16-27): QEDN NVMeTCP Offload
        The qedn patches, will be sent to 'linux-nvme@lists.infradead.org'.
      
      Marvell is fully committed to maintain, test, and address issues with
      the new nvme-tcp-offload layer.
      
      Usage:
      ======
      With the Marvell NVMeTCP offload design, the network-device (qede) and the
      offload-device (qedn) are paired on each port - Logically similar to the
      RDMA model.
      The user will interact with the network-device in order to configure
      the ip/vlan. The NVMeTCP configuration is populated as part of the
      nvme connect command.
      
      Example:
      Assign IP to the net-device (from any existing Linux tool):
      
          ip addr add 100.100.0.101/24 dev p1p1
      
      This IP will be used by both net-device (qede) and offload-device (qedn).
      
      In order to connect from "sw" nvme-tcp through the net-device (qede):
      
          nvme connect -t tcp -s 4420 -a 100.100.0.100 -n testnqn
      
      In order to connect from "offload" nvme-tcp through the offload-device (qedn):
      
          nvme connect -t tcp_offload -s 4420 -a 100.100.0.100 -n testnqn
      
      An alternative approach, and as a future enhancement that will not impact this
      series will be to modify nvme-cli with a new flag that will determine
      if "-t tcp" should be the regular nvme-tcp (which will be the default)
      or nvme-tcp-offload.
      Exmaple:
          nvme connect -t tcp -s 4420 -a 100.100.0.100 -n testnqn -[new flag]
      
      Queue Initialization Design:
      ============================
      The nvme-tcp-offload ULP module shall register with the existing
      nvmf_transport_ops (.name = "tcp_offload"), nvme_ctrl_ops and blk_mq_ops.
      The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
      with the following ops:
      - claim_dev() - in order to resolve the route to the target according to
                      the paired net_dev.
      - create_queue() - in order to create offloaded nvme-tcp queue.
      
      The nvme-tcp-offload ULP module shall manage all the controller level
      functionalities, call claim_dev and based on the return values shall call
      the relevant module create_queue in order to create the admin queue and
      the IO queues.
      
      IO-path Design:
      ===============
      The nvme-tcp-offload shall work at the IO-level - the nvme-tcp-offload
      ULP module shall pass the request (the IO) to the nvme-tcp-offload vendor
      driver and later, the nvme-tcp-offload vendor driver returns the request
      completion (the IO completion).
      No additional handling is needed in between; this design will reduce the
      CPU utilization as we will describe below.
      
      The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
      with the following IO-path ops:
      - send_req() - in order to pass the request to the handling of the
                     offload driver that shall pass it to the vendor specific device.
      - poll_queue()
      
      Once the IO completes, the nvme-tcp-offload vendor driver shall call
      command.done() that will invoke the nvme-tcp-offload ULP layer to
      complete the request.
      
      TCP events:
      ===========
      The Marvell FastLinQ NIC HW engine handle all the TCP re-transmissions
      and OOO events.
      
      Teardown and errors:
      ====================
      In case of NVMeTCP queue error the nvme-tcp-offload vendor driver shall
      call the nvme_tcp_ofld_report_queue_err.
      The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
      with the following teardown ops:
      - drain_queue()
      - destroy_queue()
      
      The Marvell FastLinQ NIC HW engine:
      ====================================
      The Marvell NIC HW engine is capable of offloading the entire TCP/IP
      stack and managing up to 64K connections per PF, already implemented and
      upstream use cases for this include iWARP (by the Marvell qedr driver)
      and iSCSI (by the Marvell qedi driver).
      In addition, the Marvell NIC HW engine offloads the NVMeTCP queue layer
      and is able to manage the IO level also in case of TCP re-transmissions
      and OOO events.
      The HW engine enables direct data placement (including the data digest CRC
      calculation and validation) and direct data transmission (including data
      digest CRC calculation).
      
      The Marvell qedn driver:
      ========================
      The new driver will be added under "drivers/nvme/hw" and will be enabled
      by the Kconfig "Marvell NVM Express over Fabrics TCP offload".
      As part of the qedn init, the driver will register as a pci device driver
      and will work with the Marvell fastlinQ NIC.
      As part of the probe, the driver will register to the nvme_tcp_offload
      (ULP) and to the qed module (qed_nvmetcp_ops) - similar to other
      "qed_*_ops" which are used by the qede, qedr, qedf and qedi device
      drivers.
      
      nvme-tcp-offload Future work:
      =============================
      - NVMF_OPT_HOST_IFACE Support.
      
      Changes since RFC v1:
      =====================
      - nvme-tcp-offload: Fix nvme_tcp_ofld_ops return values.
      - nvme-tcp-offload: Remove NVMF_TRTYPE_TCP_OFFLOAD.
      - nvme-tcp-offload: Add nvme_tcp_ofld_poll() implementation.
      - nvme-tcp-offload: Fix nvme_tcp_ofld_queue_rq() to check map_sg() and
        send_req() return values.
      
      Changes since RFC v2:
      =====================
      - nvme-tcp-offload: Fixes in controller and queue level (patches 3-6).
      - qedn: Add the Marvell's NVMeTCP HW offload vendor driver init and probe
        (patches 8-11).
      
      Changes since RFC v3:
      =====================
      - nvme-tcp-offload: Add the full implementation of the nvme-tcp-offload layer
        including the new ops: setup_ctrl(), release_ctrl(), commit_rqs() and new
        flows (ASYNC and timeout).
      - nvme-tcp-offload: Add device maximums: max_hw_sectors, max_segments.
      - nvme-tcp-offload: layer design and optimization changes.
      
      Changes since RFC v4:
      =====================
      (Many thanks to Hannes Reinecke for his feedback)
      - nvme_tcp_offload: Add num_hw_vectors in order to limit the number of queues.
      - nvme_tcp_offload: Add per device private_data.
      - nvme_tcp_offload: Fix header digest, data digest and tos initialization.
      
      Changes since RFC v5:
      =====================
      (Many thanks to Sagi Grimberg for his feedback)
      - nvme-fabrics: Expose nvmf_check_required_opts() globally (as a new patch).
      - nvme_tcp_offload: Remove io-queues BLK_MQ_F_BLOCKING.
      - nvme_tcp_offload: Fix the nvme_tcp_ofld_stop_queue (drain_queue) flow.
      - nvme_tcp_offload: Fix the nvme_tcp_ofld_free_queue (destroy_queue) flow.
      - nvme_tcp_offload: Change rwsem to mutex.
      - nvme_tcp_offload: remove redundant fields.
      - nvme_tcp_offload: Remove the "new" from setup_ctrl().
      - nvme_tcp_offload: Remove the init_req() and commit_rqs() ops.
      - nvme_tcp_offload: Minor fixes in nvme_tcp_ofld_create_ctrl() ansd
        nvme_tcp_ofld_free_queue().
      - nvme_tcp_offload: Patch 8 (timeout and async) was squeashed into
        patch 7 (io level).
      
      Changes since RFC v6:
      =====================
      - No changes in nvme_tcp_offload (only in qedn).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5ff5622e
    • Dean Balandin's avatar
      nvme-tcp-offload: Add IO level implementation · 35155e26
      Dean Balandin authored
      In this patch, we present the IO level functionality.
      The nvme-tcp-offload shall work on the IO-level, meaning the
      nvme-tcp-offload ULP module shall pass the request to the nvme-tcp-offload
      vendor driver and shall expect for the request completion.
      No additional handling is needed in between, this design will reduce the
      CPU utilization as we will describe below.
      
      The nvme-tcp-offload vendor driver shall register to nvme-tcp-offload ULP
      with the following IO-path ops:
       - send_req - in order to pass the request to the handling of the offload
         driver that shall pass it to the vendor specific device
       - poll_queue
      
      The vendor driver will manage the context from which the request will be
      executed and the request aggregations.
      Once the IO completed, the nvme-tcp-offload vendor driver shall call
      command.done() that shall invoke the nvme-tcp-offload ULP layer for
      completing the request.
      
      This patch also add support for the nvme-tcp-offload timeout and
      nvme-tcp-offload ASYNC flow.
      Acked-by: default avatarIgor Russkikh <irusskikh@marvell.com>
      Signed-off-by: default avatarDean Balandin <dbalandin@marvell.com>
      Signed-off-by: default avatarPrabhakar Kushwaha <pkushwaha@marvell.com>
      Signed-off-by: default avatarOmkar Kulkarni <okulkarni@marvell.com>
      Signed-off-by: default avatarMichal Kalderon <mkalderon@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarShai Malin <smalin@marvell.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Reviewed-by: default avatarHimanshu Madhani <himanshu.madhani@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      35155e26
    • Dean Balandin's avatar
      nvme-tcp-offload: Add queue level implementation · e4ba452d
      Dean Balandin authored
      In this patch we implement queue level functionality.
      The implementation is similar to the nvme-tcp module, the main
      difference being that we call the vendor specific create_queue op which
      creates the TCP connection, and NVMeTPC connection including
      icreq+icresp negotiation.
      Once create_queue returns successfully, we can move on to the fabrics
      connect.
      Acked-by: default avatarIgor Russkikh <irusskikh@marvell.com>
      Signed-off-by: default avatarDean Balandin <dbalandin@marvell.com>
      Signed-off-by: default avatarPrabhakar Kushwaha <pkushwaha@marvell.com>
      Signed-off-by: default avatarOmkar Kulkarni <okulkarni@marvell.com>
      Signed-off-by: default avatarMichal Kalderon <mkalderon@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarShai Malin <smalin@marvell.com>
      Reviewed-by: default avatarHimanshu Madhani <himanshu.madhani@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e4ba452d
    • Arie Gershberg's avatar
      nvme-tcp-offload: Add controller level error recovery implementation · 5faf6d68
      Arie Gershberg authored
      In this patch, we implement controller level error handling and recovery.
      Upon an error discovered by the ULP or reset controller initiated by the
      nvme-core (using reset_ctrl workqueue), the ULP will initiate a controller
      recovery which includes teardown and re-connect of all queues.
      Acked-by: default avatarIgor Russkikh <irusskikh@marvell.com>
      Signed-off-by: default avatarArie Gershberg <agershberg@marvell.com>
      Signed-off-by: default avatarPrabhakar Kushwaha <pkushwaha@marvell.com>
      Signed-off-by: default avatarOmkar Kulkarni <okulkarni@marvell.com>
      Signed-off-by: default avatarMichal Kalderon <mkalderon@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarShai Malin <smalin@marvell.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Reviewed-by: default avatarHimanshu Madhani <himanshu.madhani@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5faf6d68
    • Arie Gershberg's avatar
      nvme-tcp-offload: Add controller level implementation · 5aadd5f9
      Arie Gershberg authored
      In this patch we implement controller level functionality including:
      - create_ctrl.
      - delete_ctrl.
      - free_ctrl.
      
      The implementation is similar to other nvme fabrics modules, the main
      difference being that the nvme-tcp-offload ULP calls the vendor specific
      claim_dev() op with the given TCP/IP parameters to determine which device
      will be used for this controller.
      Once found, the vendor specific device and controller will be paired and
      kept in a controller list managed by the ULP.
      Acked-by: default avatarIgor Russkikh <irusskikh@marvell.com>
      Signed-off-by: default avatarArie Gershberg <agershberg@marvell.com>
      Signed-off-by: default avatarPrabhakar Kushwaha <pkushwaha@marvell.com>
      Signed-off-by: default avatarOmkar Kulkarni <okulkarni@marvell.com>
      Signed-off-by: default avatarMichal Kalderon <mkalderon@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarShai Malin <smalin@marvell.com>
      Reviewed-by: default avatarHimanshu Madhani <himanshu.madhani@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5aadd5f9
    • Dean Balandin's avatar
      nvme-tcp-offload: Add device scan implementation · 4b8178ec
      Dean Balandin authored
      As part of create_ctrl(), it scans the registered devices and calls
      the claim_dev op on each of them, to find the first devices that matches
      the connection params. Once the correct devices is found (claim_dev
      returns true), we raise the refcnt of that device and return that device
      as the device to be used for ctrl currently being created.
      Acked-by: default avatarIgor Russkikh <irusskikh@marvell.com>
      Signed-off-by: default avatarDean Balandin <dbalandin@marvell.com>
      Signed-off-by: default avatarPrabhakar Kushwaha <pkushwaha@marvell.com>
      Signed-off-by: default avatarOmkar Kulkarni <okulkarni@marvell.com>
      Signed-off-by: default avatarMichal Kalderon <mkalderon@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarShai Malin <smalin@marvell.com>
      Reviewed-by: default avatarHimanshu Madhani <himanshu.madhani@oracle.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4b8178ec
    • Prabhakar Kushwaha's avatar
      nvme-fabrics: Expose nvmf_check_required_opts() globally · af527935
      Prabhakar Kushwaha authored
      nvmf_check_required_opts() is used to check if user provided opts has
      the required_opts or not. if not, it will log which options are not
      provided.
      
      It can be leveraged by nvme-tcp-offload to check if provided opts are
      supported by this specific vendor driver or not.
      
      So expose nvmf_check_required_opts() globally.
      Acked-by: default avatarIgor Russkikh <irusskikh@marvell.com>
      Signed-off-by: default avatarPrabhakar Kushwaha <pkushwaha@marvell.com>
      Signed-off-by: default avatarOmkar Kulkarni <okulkarni@marvell.com>
      Signed-off-by: default avatarMichal Kalderon <mkalderon@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarShai Malin <smalin@marvell.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      af527935
    • Prabhakar Kushwaha's avatar
      nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS definitions · 98a5097d
      Prabhakar Kushwaha authored
      Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS definitions
      to header file, so it can be used by the different HW devices.
      
      NVMeTCP offload devices might have different limitations of the
      allowed options, for example, a device that does not support all the
      queue types. With tcp and rdma, only the nvme-tcp and nvme-rdma layers
      handle those attributes and the HW devices do not create any limitations
      for the allowed options.
      
      An alternative design could be to add separate fields in
      nvme_tcp_ofld_ops such as max_hw_sectors and max_segments that
      we already have in this series.
      Acked-by: default avatarIgor Russkikh <irusskikh@marvell.com>
      Signed-off-by: default avatarArie Gershberg <agershberg@marvell.com>
      Signed-off-by: default avatarPrabhakar Kushwaha <pkushwaha@marvell.com>
      Signed-off-by: default avatarOmkar Kulkarni <okulkarni@marvell.com>
      Signed-off-by: default avatarMichal Kalderon <mkalderon@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarShai Malin <smalin@marvell.com>
      Reviewed-by: default avatarHimanshu Madhani <himanshu.madhani@oracle.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Acked-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      98a5097d
    • Shai Malin's avatar
      nvme-tcp-offload: Add nvme-tcp-offload - NVMeTCP HW offload ULP · f0e8cb61
      Shai Malin authored
      This patch will present the structure for the NVMeTCP offload common
      layer driver. This module is added under "drivers/nvme/host/" and future
      offload drivers which will register to it will be placed under
      "drivers/nvme/hw".
      This new driver will be enabled by the Kconfig "NVM Express over Fabrics
      TCP offload commmon layer".
      In order to support the new transport type, for host mode, no change is
      needed.
      
      Each new vendor-specific offload driver will register to this ULP during
      its probe function, by filling out the nvme_tcp_ofld_dev->ops and
      nvme_tcp_ofld_dev->private_data and calling nvme_tcp_ofld_register_dev
      with the initialized struct.
      
      The internal implementation:
      - tcp-offload.h:
        Includes all common structs and ops to be used and shared by offload
        drivers.
      
      - tcp-offload.c:
        Includes the init function which registers as a NVMf transport just
        like any other transport.
      Acked-by: default avatarIgor Russkikh <irusskikh@marvell.com>
      Signed-off-by: default avatarDean Balandin <dbalandin@marvell.com>
      Signed-off-by: default avatarPrabhakar Kushwaha <pkushwaha@marvell.com>
      Signed-off-by: default avatarOmkar Kulkarni <okulkarni@marvell.com>
      Signed-off-by: default avatarMichal Kalderon <mkalderon@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarShai Malin <smalin@marvell.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Reviewed-by: default avatarHimanshu Madhani <himanshu.madhani@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f0e8cb61
    • David S. Miller's avatar
      Merge branch 'tipc-cleanups' · ae1d9cc3
      David S. Miller authored
      Jon Maloy says:
      
      ====================
      tipc: some small cleanups
      
      We make some minor code cleanups and improvements.
      
      v2: Changed value of TIPC_ANY_SCOPE macro in patch #3
          to avoid compiler warning
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ae1d9cc3
    • Jon Maloy's avatar
      tipc: simplify handling of lookup scope during multicast message reception · 5ef21325
      Jon Maloy authored
      We introduce a new macro TIPC_ANY_SCOPE to make the handling of the
      lookup scope value more comprehensible during multicast reception.
      
      The (unchanged) rules go as follows:
      
      1) Multicast messages sent from own node are delivered to all matching
         sockets on the own node, irrespective of their binding scope.
      
      2) Multicast messages sent from other nodes arrive here because they
         have found TIPC_CLUSTER_SCOPE bindings emanating from this node.
         Those messages should be delivered to exactly those sockets, but not
         to local sockets bound with TIPC_NODE_SCOPE, since the latter
         obviously were not meant to be visible for those senders.
      
      3) Group multicast/broadcast messages are delivered to the sockets with
         a binding scope matching exactly the lookup scope indicated in the
         message header, and nobody else.
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Tested-by: default avatarHoang Le <hoang.h.le@dektech.com.au>
      Signed-off-by: default avatarJon Maloy <jmaloy@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5ef21325
    • Jon Maloy's avatar
      tipc: refactor function tipc_sk_anc_data_recv() · 62633c2f
      Jon Maloy authored
      We refactor tipc_sk_anc_data_recv() to make it slightly more
      comprehensible, but also to facilitate application of some additions
      to the code in a future commit.
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Tested-by: default avatarHoang Le <hoang.h.le@dektech.com.au>
      Signed-off-by: default avatarJon Maloy <jmaloy@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      62633c2f
    • Jon Maloy's avatar
      tipc: eliminate redundant fields in struct tipc_sock · 14623e00
      Jon Maloy authored
      We eliminate the redundant fields conn_type and conn_instance in
      struct tipc_sock. On the connecting side, this information is already
      present in the unused (after the connection is established) part of
      the pre-allocated header, and on the accepting side, we put it there
      when the new socket is created.
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Tested-by: default avatarHoang Le <hoang.h.le@dektech.com.au>
      Signed-off-by: default avatarJon Maloy <jmaloy@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      14623e00
    • David S. Miller's avatar
      Merge branch 'QED-NVMeTCP-Offload' · eda1bc65
      David S. Miller authored
      Shai Malin says:
      
      ====================
      QED NVMeTCP Offload
      
      Intro:
      ======
      This is the qed part of Marvell’s NVMeTCP offload series, shared as
      RFC series "NVMeTCP Offload ULP and QEDN Device Drive".
      This part is a standalone series, and is not dependent on other parts
      of the RFC.
      The overall goal is to add qedn as the offload driver for NVMeTCP,
      alongside the existing offload drivers (qedr, qedi and qedf for rdma,
      iscsi and fcoe respectively).
      
      In this series we are making the necessary changes to qed to enable this
      by exposing APIs for FW/HW initializations.
      
      The qedn series (and required changes to NVMe stack) will be sent to the
      linux-nvme mailing list.
      I have included more details on the upstream plan under section with the
      same name below.
      
      The Series Patches:
      ===================
      1. qed: Add TCP_ULP FW resource layout – replacing iSCSI when common
         with NVMeTCP.
      2. qed: Add NVMeTCP Offload PF Level FW and HW HSI.
      3. qed: Add NVMeTCP Offload Connection Level FW and HW HSI.
      4. qed: Add support of HW filter block – enables redirecting NVMeTCP
         traffic to the dedicated PF.
      5. qed: Add NVMeTCP Offload IO Level FW and HW HSI.
      6. qed: Add NVMeTCP Offload IO Level FW Initializations.
      7. qed: Add IP services APIs support –VLAN, IP routing and reserving
         TCP ports for the offload device.
      
      The NVMeTCP Offload:
      ====================
      With the goal of enabling a generic infrastructure that allows NVMe/TCP
      offload devices like NICs to seamlessly plug into the NVMe-oF stack, this
      patch series introduces the nvme-tcp-offload ULP host layer, which will
      be a new transport type called "tcp-offload" and will serve as an
      abstraction layer to work with vendor specific nvme-tcp offload drivers.
      
      NVMeTCP offload is a full offload of the NVMeTCP protocol, this includes
      both the TCP level and the NVMeTCP level.
      
      The nvme-tcp-offload transport can co-exist with the existing tcp and
      other transports. The tcp offload was designed so that stack changes are
      kept to a bare minimum: only registering new transports.
      All other APIs, ops etc. are identical to the regular tcp transport.
      Representing the TCP offload as a new transport allows clear and manageable
      differentiation between the connections which should use the offload path
      and those that are not offloaded (even on the same device).
      
      The nvme-tcp-offload layers and API compared to nvme-tcp and nvme-rdma:
      
      * NVMe layer: *
      
             [ nvme/nvme-fabrics/blk-mq ]
                   |
              (nvme API and blk-mq API)
                   |
                   |
      * Vendor agnostic transport layer: *
      
            [ nvme-rdma ] [ nvme-tcp ] [ nvme-tcp-offload ]
                   |        |             |
                 (Verbs)
                   |        |             |
                   |     (Socket)
                   |        |             |
                   |        |        (nvme-tcp-offload API)
                   |        |             |
                   |        |             |
      * Vendor Specific Driver: *
      
                   |        |             |
                 [ qedr ]
                            |             |
                         [ qede ]
                                          |
                                        [ qedn ]
      
      Performance:
      ============
      With this implementation on top of the Marvell qedn driver (using the
      Marvell FastLinQ NIC), we were able to demonstrate the following CPU
      utilization improvement:
      
      On AMD EPYC 7402, 2.80GHz, 28 cores:
      - For 16K queued read IOs, 16jobs, 4qd (50Gbps line rate):
        Improved the CPU utilization from 15.1% with NVMeTCP SW to 4.7% with
        NVMeTCP offload.
      
      On Intel(R) Xeon(R) Gold 5122 CPU, 3.60GHz, 16 cores:
      - For 512K queued read IOs, 16jobs, 4qd (25Gbps line rate):
        Improved the CPU utilization from 16.3% with NVMeTCP SW to 1.1% with
        NVMeTCP offload.
      
      In addition, we were able to demonstrate the following latency improvement:
      - For 200K read IOPS (16 jobs, 16 qd, with fio rate limiter):
        Improved the average latency from 105 usec with NVMeTCP SW to 39 usec
        with NVMeTCP offload.
      
        Improved the 99.99 tail latency from 570 usec with NVMeTCP SW to 91 usec
        with NVMeTCP offload.
      
      The end-to-end offload latency was measured from fio while running against
      back end of null device.
      
      The Marvell FastLinQ NIC HW engine:
      ====================================
      The Marvell NIC HW engine is capable of offloading the entire TCP/IP
      stack and managing up to 64K connections per PF, already implemented and
      upstream use cases for this include iWARP (by the Marvell qedr driver)
      and iSCSI (by the Marvell qedi driver).
      In addition, the Marvell NIC HW engine offloads the NVMeTCP queue layer
      and is able to manage the IO level also in case of TCP re-transmissions
      and OOO events.
      The HW engine enables direct data placement (including the data digest CRC
      calculation and validation) and direct data transmission (including data
      digest CRC calculation).
      
      The Marvell qedn driver:
      ========================
      The new driver will be added under "drivers/nvme/hw" and will be enabled
      by the Kconfig "Marvell NVM Express over Fabrics TCP offload".
      As part of the qedn init, the driver will register as a pci device driver
      and will work with the Marvell fastlinQ NIC.
      As part of the probe, the driver will register to the nvme_tcp_offload
      (ULP) and to the qed module (qed_nvmetcp_ops) - similar to other
      "qed_*_ops" which are used by the qede, qedr, qedf and qedi device
      drivers.
      
      Upstream Plan:
      =============
      The RFC series "NVMeTCP Offload ULP and QEDN Device Driver"
      https://lore.kernel.org/netdev/20210531225222.16992-1-smalin@marvell.com/
      was designed in a modular way so that part 1 (nvme-tcp-offload) and
      part 2 (qed) are independent and part 3 (qedn) depends on both parts 1+2.
      
      - Part 1 (RFC patch 1-8): NVMeTCP Offload ULP
        The nvme-tcp-offload patches, will be sent to
        'linux-nvme@lists.infradead.org'.
      
      - Part 2 (RFC patches 9-15): QED NVMeTCP Offload
        The qed infrastructure, will be sent to 'netdev@vger.kernel.org'.
      
      Once part 1 and 2 are accepted:
      
      - Part 3 (RFC patches 16-27): QEDN NVMeTCP Offload
        The qedn patches, will be sent to 'linux-nvme@lists.infradead.org'.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eda1bc65
    • Nikolay Assa's avatar
      qed: Add IP services APIs support · 806ee7f8
      Nikolay Assa authored
      This patch introduces APIs which the NVMeTCP Offload device (qedn)
      will use through the paired net-device (qede).
      It includes APIs for:
      - ipv4/ipv6 routing
      - get VLAN from net-device
      - TCP ports reservation
      Acked-by: default avatarIgor Russkikh <irusskikh@marvell.com>
      Signed-off-by: default avatarNikolay Assa <nassa@marvell.com>
      Signed-off-by: default avatarPrabhakar Kushwaha <pkushwaha@marvell.com>
      Signed-off-by: default avatarOmkar Kulkarni <okulkarni@marvell.com>
      Signed-off-by: default avatarMichal Kalderon <mkalderon@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarShai Malin <smalin@marvell.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      806ee7f8
    • Shai Malin's avatar
      qed: Add NVMeTCP Offload IO Level FW Initializations · 826da486
      Shai Malin authored
      This patch introduces the NVMeTCP FW initializations which is used
      to initialize the IO level configuration into a per IO HW
      resource ("task") as part of the IO path flow.
      
      This includes:
      - Write IO FW initialization
      - Read IO FW initialization.
      - IC-Req and IC-Resp FW exchange.
      - FW Cleanup flow (Flush IO).
      Acked-by: default avatarIgor Russkikh <irusskikh@marvell.com>
      Signed-off-by: default avatarPrabhakar Kushwaha <pkushwaha@marvell.com>
      Signed-off-by: default avatarOmkar Kulkarni <okulkarni@marvell.com>
      Signed-off-by: default avatarShai Malin <smalin@marvell.com>
      Signed-off-by: default avatarMichal Kalderon <mkalderon@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      826da486
    • Shai Malin's avatar
      qed: Add NVMeTCP Offload IO Level FW and HW HSI · ab47bdfd
      Shai Malin authored
      This patch introduces the NVMeTCP Offload FW and HW  HSI in order
      to initialize the IO level configuration into a per IO HW
      resource ("task") as part of the IO path flow.
      Acked-by: default avatarIgor Russkikh <irusskikh@marvell.com>
      Signed-off-by: default avatarPrabhakar Kushwaha <pkushwaha@marvell.com>
      Signed-off-by: default avatarOmkar Kulkarni <okulkarni@marvell.com>
      Signed-off-by: default avatarShai Malin <smalin@marvell.com>
      Signed-off-by: default avatarMichal Kalderon <mkalderon@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab47bdfd
    • Prabhakar Kushwaha's avatar
      qed: Add support of HW filter block · 203d136e
      Prabhakar Kushwaha authored
      This patch introduces the functionality of HW filter block.
      It adds and removes filters based on source and target TCP port.
      
      It also add functionality to clear all filters at once.
      Acked-by: default avatarIgor Russkikh <irusskikh@marvell.com>
      Signed-off-by: default avatarPrabhakar Kushwaha <pkushwaha@marvell.com>
      Signed-off-by: default avatarOmkar Kulkarni <okulkarni@marvell.com>
      Signed-off-by: default avatarShai Malin <smalin@marvell.com>
      Signed-off-by: default avatarMichal Kalderon <mkalderon@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      203d136e
    • Shai Malin's avatar
      qed: Add NVMeTCP Offload Connection Level FW and HW HSI · 76684ab8
      Shai Malin authored
      This patch introduces the NVMeTCP HSI and HSI functionality in order to
      initialize and interact with the HW device as part of the connection level
      HSI.
      
      This includes:
      - Connection offload: offload a TCP connection to the FW.
      - Connection update: update the ICReq-ICResp params
      - Connection clear SQ: outstanding IOs FW flush.
      - Connection termination: terminate the TCP connection and flush the FW.
      Acked-by: default avatarIgor Russkikh <irusskikh@marvell.com>
      Signed-off-by: default avatarPrabhakar Kushwaha <pkushwaha@marvell.com>
      Signed-off-by: default avatarOmkar Kulkarni <okulkarni@marvell.com>
      Signed-off-by: default avatarShai Malin <smalin@marvell.com>
      Signed-off-by: default avatarMichal Kalderon <mkalderon@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      76684ab8
    • Shai Malin's avatar
      qed: Add NVMeTCP Offload PF Level FW and HW HSI · 897e87a1
      Shai Malin authored
      This patch introduces the NVMeTCP device and PF level HSI and HSI
      functionality in order to initialize and interact with the HW device.
      The patch also adds qed NVMeTCP personality.
      
      This patch is based on the qede, qedr, qedi, qedf drivers HSI.
      Acked-by: default avatarIgor Russkikh <irusskikh@marvell.com>
      Signed-off-by: default avatarDean Balandin <dbalandin@marvell.com>
      Signed-off-by: default avatarPrabhakar Kushwaha <pkushwaha@marvell.com>
      Signed-off-by: default avatarOmkar Kulkarni <okulkarni@marvell.com>
      Signed-off-by: default avatarShai Malin <smalin@marvell.com>
      Signed-off-by: default avatarMichal Kalderon <mkalderon@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      897e87a1
    • Omkar Kulkarni's avatar
      qed: Add TCP_ULP FW resource layout · 1bd4f571
      Omkar Kulkarni authored
      Add TCP_ULP as a storage common TCP offload FW resource layout.
      This will be used by the core driver (QED) for both the NVMeTCP and iSCSI.
      Acked-by: default avatarIgor Russkikh <irusskikh@marvell.com>
      Signed-off-by: default avatarPrabhakar Kushwaha <pkushwaha@marvell.com>
      Signed-off-by: default avatarOmkar Kulkarni <okulkarni@marvell.com>
      Signed-off-by: default avatarMichal Kalderon <mkalderon@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Signed-off-by: default avatarShai Malin <smalin@marvell.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1bd4f571
    • Krzysztof Kozlowski's avatar
      nfc: mrvl: reduce the scope of local variables · 2c95e6c7
      Krzysztof Kozlowski authored
      In two places the 'ep_desc' and 'skb' local variables are used only
      within if() or for() block, so they scope can be reduced which makes the
      entire code slightly easier to follow.  No functional change.
      Suggested-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c95e6c7
    • Krzysztof Kozlowski's avatar
      nfc: mrvl: remove useless "continue" at end of loop · a5822404
      Krzysztof Kozlowski authored
      The "continue" statement at the end of a for loop does not have an
      effect.  Entire loop contents can be slightly simplified to increase
      code readability.  No functional change.
      Suggested-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5822404
    • David S. Miller's avatar
      Merge branch 'smc-next' · 81ac670a
      David S. Miller authored
      Karsten Graul says:
      
      ====================
      net/smc: updates 2021-06-02
      
      Please apply the following patch series for smc to netdev's net-next tree.
      
      Both patches are cleanups and remove unnecessary code.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      81ac670a
    • Julian Wiedmann's avatar
      net/smc: no need to flush smcd_dev's event_wq before destroying it · 5e4a43ce
      Julian Wiedmann authored
      destroy_workqueue() already calls drain_workqueue(), which is a stronger
      variant of flush_workqueue().
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e4a43ce
    • Karsten Graul's avatar
      net/smc: avoid possible duplicate dmb unregistration · f8e0a68b
      Karsten Graul authored
      smc_lgr_cleanup() calls smcd_unregister_all_dmbs() as part of the link
      group termination process. This is a leftover from the times when
      smc_lgr_cleanup() scheduled a worker to actually free the link group.
      Nowadays smc_lgr_cleanup() directly calls smc_lgr_free() without any
      delay so an earlier dmb unregistration is no longer needed.
      So remove smcd_unregister_all_dmbs() and the call to it.
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f8e0a68b
    • David S. Miller's avatar
      Merge branch 'xpcs-phylink_pcs_ops' · c356be05
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      Convert xpcs to phylink_pcs_ops
      
      Background: the sja1105 DSA driver currently drives a Designware XPCS
      for SGMII and 2500base-X, and it would be nice to reuse some code with
      the xpcs module. This would also help consolidate the phylink_pcs_ops,
      since the only other dedicated PCS driver, currently, is the lynx_pcs.
      
      Therefore, this series makes the xpcs expose the same kind of API that
      the lynx_pcs module does. The main changes are getting rid of struct
      mdio_xpcs_ops, being compatible with struct phylink_pcs_ops and being
      less reliant on the phy_interface_t passed to xpcs_probe (now renamed to
      xpcs_create).
      
      This patch series is partially tested (some code paths have been covered
      on the NXP SJA1105 and some others with the help of Vee Khee Wong on
      Intel Tiger Lake / stmmac) but further testing on 10G setups would be
      appreciated, if possible.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c356be05
    • Vladimir Oltean's avatar
      net: pcs: xpcs: convert to phylink_pcs_ops · 11059740
      Vladimir Oltean authored
      Since all the remaining members of struct mdio_xpcs_ops have direct
      equivalents in struct phylink_pcs_ops, it is about time we remove it
      altogether.
      
      Since the phylink ops return void, we need to remove the error
      propagation from the various xpcs methods and simply print an error
      message where appropriate.
      
      Since xpcs_get_state_c73() detects link faults and attempts to reset the
      link on its own by calling xpcs_config(), but xpcs_config() now has a
      lot of phylink arguments which are not needed and cannot be simply
      fabricated by anybody else except phylink, the actual implementation has
      been moved into a smaller xpcs_do_config().
      
      The const struct mdio_xpcs_ops *priv->hw->xpcs has been removed, so we
      need to look at the struct mdio_xpcs_args pointer now as an indication
      whether the port has an XPCS or not.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      11059740
    • Vladimir Oltean's avatar
      net: pcs: xpcs: convert to mdio_device · 2cac15da
      Vladimir Oltean authored
      Unify the 2 existing PCS drivers (lynx and xpcs) by doing a similar
      thing on probe, which is to have a *_create function that takes a
      struct mdio_device * given by the caller, and builds a private PCS
      structure around that.
      
      This changes stmmac to hold only a pointer to the xpcs, as opposed to
      the full structure. This will be used in the next patch when struct
      mdio_xpcs_ops is removed. Currently a pointer to struct mdio_xpcs_ops
      is used as a shorthand to determine whether the port has an XPCS or not.
      We can do the same now with the mdio_xpcs_args pointer.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2cac15da
    • Vladimir Oltean's avatar
      net: pcs: xpcs: use mdiobus_c45_addr in xpcs_{read,write} · 679e283e
      Vladimir Oltean authored
      Use the dedicated helper for abstracting away how the clause 45 address
      is packed in reg_addr.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      679e283e
    • Vladimir Oltean's avatar
      net: pcs: xpcs: export xpcs_probe · 8e2bb956
      Vladimir Oltean authored
      Similar to the other recently functions, it is not necessary for
      xpcs_probe to be a function pointer, so export it so that it can be
      called directly.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8e2bb956
    • Vladimir Oltean's avatar
      net: pcs: xpcs: export xpcs_config_eee · 14b517cb
      Vladimir Oltean authored
      There is no good reason why we need to go through:
      
      stmmac_xpcs_config_eee
      -> stmmac_do_callback
         -> mdio_xpcs_ops->config_eee
            -> xpcs_config_eee
      
      when we can simply call xpcs_config_eee.
      
      priv->hw->xpcs is of the type "const struct mdio_xpcs_ops *" and is used
      as a placeholder/synonym for priv->plat->mdio_bus_data->has_xpcs. It is
      done that way because the mdio_bus_data pointer might or might not be
      populated in all stmmac instantiations.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      14b517cb
    • Vladimir Oltean's avatar
      net: pcs: xpcs: export xpcs_validate · a1a753ed
      Vladimir Oltean authored
      Calling a function pointer with a single implementation through
      struct mdio_xpcs_ops is clunky, and the stmmac_do_callback system forces
      this to return int, even though it always returns zero.
      
      Simply remove the "validate" function pointer from struct mdio_xpcs_ops
      and replace it with an exported xpcs_validate symbol which is called
      directly by stmmac.
      
      priv->hw->xpcs is of the type "const struct mdio_xpcs_ops *" and is used
      as a placeholder/synonym for priv->plat->mdio_bus_data->has_xpcs. It is
      done that way because the mdio_bus_data pointer might or might not be
      populated in all stmmac instantiations.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a1a753ed
    • Vladimir Oltean's avatar
      net: pcs: xpcs: make the checks related to the PHY interface mode stateless · 9900074e
      Vladimir Oltean authored
      The operating mode of the driver is currently to populate its
      struct mdio_xpcs_args::supported and struct mdio_xpcs_args::an_mode
      statically in xpcs_probe(), based on the passed phy_interface_t,
      and work with those.
      
      However this is not the operation that phylink expects from a PCS
      driver, because the port might be attached to an SFP cage that triggers
      changes of the phy_interface_t dynamically as one SFP module is
      unpluggged and another is plugged.
      
      To migrate towards that model, the struct mdio_xpcs_args should not
      cache anything related to the phy_interface_t, but just look up the
      statically defined, const struct xpcs_compat structure corresponding to
      the detected PCS OUI/model number.
      
      So we delete the "supported" and "an_mode" members of struct
      mdio_xpcs_args, and add the "id" structure there (since the ID is not
      expected to change at runtime).
      
      Since xpcs->supported is used deep in the code in _xpcs_config_aneg_c73(),
      we need to modify some function headers to pass the xpcs_compat from all
      callers. In turn, the xpcs_compat is always supplied externally to the
      xpcs module:
      - Most of the time by phylink
      - In xpcs_probe() it is needed because xpcs_soft_reset() writes to
        MDIO_MMD_PCS or to MDIO_MMD_VEND2 depending on whether an_mode is clause
        37 or clause 73. In order to not introduce functional changes related
        to when the soft reset is issued, we continue to require the initial
        phy_interface_t argument to be passed to xpcs_probe() so we can pass
        this on to xpcs_soft_reset().
      - stmmac_open() wants to know whether to call stmmac_init_phy() or not,
        and for that it looks inside xpcs->an_mode, because the clause 73
        (backplane) AN modes supposedly do not have a PHY. Because we moved
        an_mode outside of struct mdio_xpcs_args, this is now no longer
        directly possible, so we introduce a helper function xpcs_get_an_mode()
        which protects the data encapsulation of the xpcs module and requires
        a phy_interface_t to be passed as argument. This function can look up
        the appropriate compat based on the phy_interface_t.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9900074e
    • Vladimir Oltean's avatar
      net: pcs: xpcs: there is only one PHY ID · a54a8b71
      Vladimir Oltean authored
      The xpcs driver has an apparently inadequate structure for the actual
      hardware it drives.
      
      These defines and the xpcs_probe() function would suggest that there is
      one PHY ID per supported PHY interface type, and the driver simply
      validates whether the mode it should operate in (the argument of
      xpcs_probe) matches what the hardware is capable of:
      
      	#define SYNOPSYS_XPCS_USXGMII_ID	0x7996ced0
      	#define SYNOPSYS_XPCS_10GKR_ID		0x7996ced0
      	#define SYNOPSYS_XPCS_XLGMII_ID		0x7996ced0
      	#define SYNOPSYS_XPCS_SGMII_ID		0x7996ced0
      	#define SYNOPSYS_XPCS_MASK		0xffffffff
      
      but that is not the case, because upon closer inspection, all the above
      4 PHY ID definitions are in fact equal.
      
      So it is the same XPCS that is compatible with all 4 sets of PHY
      interface types.
      
      This change introduces an array of struct xpcs_compat which is populated
      by the single struct xpcs_id instance. It also eliminates the bogus
      defines for multiple Synopsys XPCS PHY IDs and replaces them with a
      single XPCS_ID, which better reflects the way in which the hardware
      operates.
      
      Because we are touching this area of the code anyway, the new array of
      struct xpcs_compat, as well as the array of xpcs_id, have been moved
      towards the end of the file, since they are variable declarations not
      definitions. If whichever of struct xpcs_compat or struct xpcs_id need
      to gain a function pointer member in the future, it is easier to
      reference functions (no forward declarations needed) if we have the
      const variable declarations at the end of the file.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a54a8b71
    • Vladimir Oltean's avatar
      net: pcs: xpcs: delete shim definition for mdio_xpcs_get_ops() · b81017ae
      Vladimir Oltean authored
      CONFIG_STMMAC_ETH selects CONFIG_PCS_XPCS, so there should be no
      situation where the shim should be needed.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b81017ae