1. 09 Apr, 2024 24 commits
  2. 08 Apr, 2024 16 commits
    • Heiner Kallweit's avatar
      r8169: add support for RTL8168M · 39f59c72
      Heiner Kallweit authored
      A user reported an unknown chip version. According to the r8168 vendor
      driver it's called RTL8168M, but handling is identical to RTL8168H.
      So let's simply treat it as RTL8168H.
      Tested-by: default avatarЕвгений <octobergun@gmail.com>
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      39f59c72
    • David S. Miller's avatar
      Merge branch 'devlink-io-eqs' · 358961f5
      David S. Miller authored
      Parav Pandit says:
      
      ====================
      devlink: Add port function attribute for IO EQs
      
      Currently, PCI SFs and VFs use IO event queues to deliver netdev per
      channel events. The number of netdev channels is a function of IO
      event queues. In the second scenario of an RDMA device, the
      completion vectors are also a function of IO event queues. Currently, an
      administrator on the hypervisor has no means to provision the number
      of IO event queues for the SF device or the VF device. Device/firmware
      determines some arbitrary value for these IO event queues. Due to this,
      the SF netdev channels are unpredictable, and consequently, the
      performance is too.
      
      This short series introduces a new port function attribute: max_io_eqs.
      The goal is to provide administrators at the hypervisor level with the
      ability to provision the maximum number of IO event queues for a
      function. This gives the control to the administrator to provision
      right number of IO event queues and have predictable performance.
      
      Examples of when an administrator provisions (set) maximum number of
      IO event queues when using switchdev mode:
      
        $ devlink port show pci/0000:06:00.0/1
            pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
                function:
                hw_addr 00:00:00:00:00:00 roce enable max_io_eqs 10
      
        $ devlink port function set pci/0000:06:00.0/1 max_io_eqs 20
      
        $ devlink port show pci/0000:06:00.0/1
            pci/0000:06:00.0/1: type eth netdev enp6s0pf0vf0 flavour pcivf pfnum 0 vfnum 0
                function:
                hw_addr 00:00:00:00:00:00 roce enable max_io_eqs 20
      
      This sets the corresponding maximum IO event queues of the function
      before it is enumerated. Thus, when the VF/SF driver reads the
      capability from the device, it sees the value provisioned by the
      hypervisor. The driver is then able to configure the number of channels
      for the net device, as well as the number of completion vectors
      for the RDMA device. The device/firmware also honors the provisioned
      value, hence any VF/SF driver attempting to create IO EQs
      beyond provisioned value results in an error.
      
      With above setting now, the administrator is able to achieve the 2x
      performance on SFs with 20 channels. In second example when SF was
      provisioned for a container with 2 cpus, the administrator provisioned only
      2 IO event queues, thereby saving device resources.
      
      With the above settings now in place, the administrator achieved 2x
      performance with the SF device with 20 channels. In the second example,
      when the SF was provisioned for a container with 2 CPUs, the administrator
      provisioned only 2 IO event queues, thereby saving device resources.
      
      changelog:
      v2->v3:
      - limited to 80 chars per line in devlink
      - fixed comments from Jakub in mlx5 driver to fix missing mutex unlock
        on error path
      v1->v2:
      - limited comment to 80 chars per line in header file
      - fixed set function variables for reverse christmas tree
      - fixed comments from Kalesh
      - fixed missing kfree in get call
      - returning error code for get cmd failure
      - fixed error msg copy paste error in set on cmd failure
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      358961f5
    • Parav Pandit's avatar
      mlx5/core: Support max_io_eqs for a function · 93197c7c
      Parav Pandit authored
      Implement get and set for the maximum IO event queues for SF and VF.
      This enables administrator on the hypervisor to control the maximum
      IO event queues which are typically used to derive the maximum and
      default number of net device channels or rdma device completion vectors.
      Reviewed-by: default avatarShay Drory <shayd@nvidia.com>
      Signed-off-by: default avatarParav Pandit <parav@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93197c7c
    • Parav Pandit's avatar
      devlink: Support setting max_io_eqs · 5af3e387
      Parav Pandit authored
      Many devices send event notifications for the IO queues,
      such as tx and rx queues, through event queues.
      
      Enable a privileged owner, such as a hypervisor PF, to set the number
      of IO event queues for the VF and SF during the provisioning stage.
      
      example:
      Get maximum IO event queues of the VF device::
      
        $ devlink port show pci/0000:06:00.0/2
        pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
            function:
                hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 10
      
      Set maximum IO event queues of the VF device::
      
        $ devlink port function set pci/0000:06:00.0/2 max_io_eqs 32
      
        $ devlink port show pci/0000:06:00.0/2
        pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1
            function:
                hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 32
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarShay Drory <shayd@nvidia.com>
      Signed-off-by: default avatarParav Pandit <parav@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5af3e387
    • Eric Dumazet's avatar
      net: display more skb fields in skb_dump() · 4308811b
      Eric Dumazet authored
      Print these additional fields in skb_dump() to ease debugging.
      
      - mac_len
      - csum_start (in v2, at Willem suggestion)
      - csum_offset (in v2, at Willem suggestion)
      - priority
      - mark
      - alloc_cpu
      - vlan_all
      - encapsulation
      - inner_protocol
      - inner_mac_header
      - inner_network_header
      - inner_transport_header
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4308811b
    • David S. Miller's avatar
      Merge branch 'phy-cleanup-EEE' · 7812da81
      David S. Miller authored
      Andrew Lunn says:
      
      ====================
      net: Clean up some EEE code
      
      Previous patches have reworked the API between phylib and MAC drivers
      with respect to EEE, pushing most of the work into phylib. These two
      patches rework two drivers to make use of the new API, and fix their
      EEE implementation, so that EEE is configured in the MAC based on what
      is actually negotiated during autoneg.
      
      Compile tested only.
      ====================
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7812da81
    • Andrew Lunn's avatar
      net: lan743x: Fixup EEE · ef460a89
      Andrew Lunn authored
      The enabling/disabling of EEE in the MAC should happen as a result of
      auto negotiation. So move the enable/disable into
      lan743x_phy_link_status_change() which gets called by phylib when
      there is a change in link status.
      
      lan743x_ethtool_set_eee() now just programs the hardware with the LTI
      timer value, and passed everything else to phylib, so it can correctly
      setup the PHY.
      
      lan743x_ethtool_get_eee() relies on phylib doing most of the work, the
      MAC driver just adds the LTI timer value.
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef460a89
    • Andrew Lunn's avatar
      net: usb: lan78xx: Fixup EEE · a00bbd15
      Andrew Lunn authored
      The enabling/disabling of EEE in the MAC should happen as a result of
      auto negotiation. So move the enable/disable into
      lan783xx_phy_link_status_change() which gets called by phylib when
      there is a change in link status.
      
      lan78xx_set_eee() now just programs the hardware with the LPI
      timer value, and passed everything else to phylib, so it can correctly
      setup the PHY.
      
      lan743x_get_eee() relies on phylib doing most of the work, the
      MAC driver just adds the LPI timer value.
      
      Call phy_support_eee() to indicate the MAC does actually support EEE.
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a00bbd15
    • Jason Xing's avatar
      mptcp: add reset reason options in some places · 382c6001
      Jason Xing authored
      The reason codes are handled in two ways nowadays (quoting Mat Martineau):
      1. Sending in the MPTCP option on RST packets when there is no subflow
      context available (these use subflow_add_reset_reason() and directly call
      a TCP-level send_reset function)
      2. The "normal" way via subflow->reset_reason. This will propagate to both
      the outgoing reset packet and to a local path manager process via netlink
      in mptcp_event_sub_closed()
      
      RFC 8684 defines the skb reset reason behaviour which is not required
      even though in some places:
      
          A host sends a TCP RST in order to close a subflow or reject
          an attempt to open a subflow (MP_JOIN). In order to let the
          receiving host know why a subflow is being closed or rejected,
          the TCP RST packet MAY include the MP_TCPRST option (Figure 15).
          The host MAY use this information to decide, for example, whether
          it tries to re-establish the subflow immediately, later, or never.
      
      Since the commit dc87efdb ("mptcp: add mptcp reset option support")
      introduced this feature about three years ago, we can fully use it.
      There remains some places where we could insert reason into skb as
      we can see in this patch.
      
      Many thanks to Mat and Paolo for help:)
      Signed-off-by: default avatarJason Xing <kernelxing@tencent.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      382c6001
    • Guillaume Nault's avatar
      ipv4: Set scope explicitly in ip_route_output(). · ec20b283
      Guillaume Nault authored
      Add a "scope" parameter to ip_route_output() so that callers don't have
      to override the tos parameter with the RTO_ONLINK flag if they want a
      local scope.
      
      This will allow converting flowi4_tos to dscp_t in the future, thus
      allowing static analysers to flag invalid interactions between
      "tos" (the DSCP bits) and ECN.
      
      Only three users ask for local scope (bonding, arp and atm). The others
      continue to use RT_SCOPE_UNIVERSE. While there, add a comment to warn
      users about the limitations of ip_route_output().
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Acked-by: Leon Romanovsky <leonro@nvidia.com> # infiniband
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ec20b283
    • Venkat Venkatsubra's avatar
      ipvlan: handle NETDEV_DOWN event · 22978397
      Venkat Venkatsubra authored
      In case of stacked devices, to help propagate the down
      link state from the parent/root device (to this leaf device),
      handle NETDEV_DOWN event like it is done now for NETDEV_UP.
      
      In the below example, ens5 is the host interface which is the
      parent of the ipvlan interface eth0 in the container.
      
      Host:
      
      [root@gkn-podman-x64 ~]# ip link set ens5 down
      [root@gkn-podman-x64 ~]# ip -d link show dev ens5
      3: ens5: <BROADCAST,MULTICAST> mtu 9000 qdisc mq state DOWN
            ...
      [root@gkn-podman-x64 ~]#
      
      Container:
      
      [root@testnode-ol8 /]# ip -d link show dev eth0
      2: eth0@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 state UNKNOWN
              ...
          ipvlan mode l2 bridge
              ...
      [root@testnode-ol8 /]#
      
      eth0's state continues to show up as UP even though ens5 is now DOWN.
      
      For macvlan the handling of NETDEV_DOWN event was added in
      commit 80fd2d6c ("macvlan: Change status when lower device goes down").
      Reported-by: default avatarGia-Khanh Nguyen <gia-khanh.nguyen@oracle.com>
      Signed-off-by: default avatarVenkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      22978397
    • Eric Dumazet's avatar
      af_packet: avoid a false positive warning in packet_setsockopt() · 86d43e2b
      Eric Dumazet authored
      Although the code is correct, the following line
      
      	copy_from_sockptr(&req_u.req, optval, len));
      
      triggers this warning :
      
      memcpy: detected field-spanning write (size 28) of single field "dst" at include/linux/sockptr.h:49 (size 16)
      
      Refactor the code to be more explicit.
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      86d43e2b
    • Niklas Schnelle's avatar
      net: handle HAS_IOPORT dependencies · a29689e6
      Niklas Schnelle authored
      In a future patch HAS_IOPORT=n will disable inb()/outb() and friends at
      compile time. We thus need to add HAS_IOPORT as dependency for
      those drivers requiring them. For the DEFXX driver the use of I/O
      ports is optional and we only need to fence specific code paths. It also
      turns out that with HAS_IOPORT handled explicitly HAMRADIO does not need
      the !S390 dependency and successfully builds the bpqether driver.
      Acked-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Acked-by: default avatarMaciej W. Rozycki <macro@orcam.me.uk>
      Co-developed-by: default avatarArnd Bergmann <arnd@kernel.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@kernel.org>
      Signed-off-by: default avatarNiklas Schnelle <schnelle@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a29689e6
    • David S. Miller's avatar
      Merge branch 'mptcp-selftests' · 6e51d914
      David S. Miller authored
      Matthieu Baerts says:
      
      ====================
      selftests: mptcp: cleanups and 'ip mptcp' support
      
      Here are some patches from Geliang, doing different cleanups, and
      supporting 'ip mptcp' in more MPTCP selftests.
      
      Patch 1 checks that TC is available in selftests requiring it.
      
      Patch 2 adds 'ms' units in TC commands, to avoid confusions.
      
      Patches 3-9 are some prerequisites for patch 10: some export code from
      mptcp_join.sh to mptcp_lib.sh, to be re-used in pm_netlink.sh,
      mptcp_sockopt.sh and simult_flows.sh ; and others add helpers to
      pm_netlink.sh to easily support both 'ip mptcp' and 'pm_nl_ctl' tools to
      interact with the in-kernel MPTCP path-manager.
      
      Patch 10 adds a '-i' parameter in mptcp_sockopt.sh, pm_netlink.sh, and
      simult_flows.sh to use 'ip mptcp' tool instead of 'pm_nl_ctl'.
      
      Patch 11 fixes some ShellCheck warnings in pm_netlink.sh, in order to
      drop a ShellCheck's 'disable' instruction.
      ====================
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e51d914
    • Geliang Tang's avatar
      selftests: mptcp: netlink: drop disable=SC2086 · 6eaeda12
      Geliang Tang authored
      Now there are only a few of variables are not using double quotes.
      Modifying them, then "shellcheck disable=SC2086" can be dropped.
      Signed-off-by: default avatarGeliang Tang <tanggeliang@kylinos.cn>
      Reviewed-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6eaeda12
    • Geliang Tang's avatar
      selftests: mptcp: ip_mptcp option for more scripts · 0cef6fca
      Geliang Tang authored
      This patch adds '-i' option for mptcp_sockopt.sh, pm_netlink.sh, and
      simult_flows.sh, to use 'ip mptcp' command in the tests instead of
      'pm_nl_ctl'. Update usage() correspondingly.
      Signed-off-by: default avatarGeliang Tang <tanggeliang@kylinos.cn>
      Reviewed-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0cef6fca