1. 05 Mar, 2024 34 commits
    • Eric Dumazet's avatar
      net/smc: reduce rtnl pressure in smc_pnet_create_pnetids_list() · 00af2aa9
      Eric Dumazet authored
      Many syzbot reports show extreme rtnl pressure, and many of them hint
      that smc acquires rtnl in netns creation for no good reason [1]
      
      This patch returns early from smc_pnet_net_init()
      if there is no netdevice yet.
      
      I am not even sure why smc_pnet_create_pnetids_list() even exists,
      because smc_pnet_netdev_event() is also calling
      smc_pnet_add_base_pnetid() when handling NETDEV_UP event.
      
      [1] extract of typical syzbot reports
      
      2 locks held by syz-executor.3/12252:
        #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491
        #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline]
        #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878
      2 locks held by syz-executor.4/12253:
        #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491
        #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline]
        #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878
      2 locks held by syz-executor.1/12257:
        #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491
        #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline]
        #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878
      2 locks held by syz-executor.2/12261:
        #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491
        #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline]
        #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878
      2 locks held by syz-executor.0/12265:
        #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491
        #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline]
        #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878
      2 locks held by syz-executor.3/12268:
        #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491
        #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline]
        #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878
      2 locks held by syz-executor.4/12271:
        #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491
        #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline]
        #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878
      2 locks held by syz-executor.1/12274:
        #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491
        #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline]
        #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878
      2 locks held by syz-executor.2/12280:
        #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491
        #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline]
        #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Wenjia Zhang <wenjia@linux.ibm.com>
      Cc: Jan Karcher <jaka@linux.ibm.com>
      Cc: "D. Wythe" <alibuda@linux.alibaba.com>
      Cc: Tony Lu <tonylu@linux.alibaba.com>
      Cc: Wen Gu <guwen@linux.alibaba.com>
      Reviewed-by: default avatarWenjia Zhang <wenjia@linux.ibm.com>
      Link: https://lore.kernel.org/r/20240302100744.3868021-1-edumazet@google.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      00af2aa9
    • Paolo Abeni's avatar
      Merge tag 'linux-can-next-for-6.9-20240304' of... · eead0599
      Paolo Abeni authored
      Merge tag 'linux-can-next-for-6.9-20240304' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can-next 2024-03-04
      
      this is a pull request of 4 patches for net-next/master.
      
      The 1st patch is by Jimmy Assarsson and adds support for the Leaf v3
      to the kvaser_usb driver.
      
      Martin Jocić's patch targets the kvaser_pciefd driver and adds support
      for the Kvaser PCIe 8xCAN device.
      
      Followed by a patch by me that adds a missing a cpu_to_le32() to the
      gs_usb driver, the change is not critical as the assigned value is 0.
      
      The last patch is also by me and replaces a literal 256 with a proper
      define.
      
      linux-can-next-for-6.9-20240304
      
      * tag 'linux-can-next-for-6.9-20240304' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
        can: mcp251xfd: __mcp251xfd_get_berr_counter(): use CAN_BUS_OFF_THRESHOLD instead of open coding it
        can: gs_usb: gs_cmd_reset(): use cpu_to_le32() to assign mode
        can: kvaser_pciefd: Add support for Kvaser PCIe 8xCAN
        can: kvaser_usb: Add support for Leaf v3
      ====================
      
      Link: https://lore.kernel.org/r/20240304092051.3631481-1-mkl@pengutronix.deSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      eead0599
    • Abhishek Chauhan's avatar
      net: Re-use and set mono_delivery_time bit for userspace tstamp packets · 885c36e5
      Abhishek Chauhan authored
      Bridge driver today has no support to forward the userspace timestamp
      packets and ends up resetting the timestamp. ETF qdisc checks the
      packet coming from userspace and encounters to be 0 thereby dropping
      time sensitive packets. These changes will allow userspace timestamps
      packets to be forwarded from the bridge to NIC drivers.
      
      Setting the same bit (mono_delivery_time) to avoid dropping of
      userspace tstamp packets in the forwarding path.
      
      Existing functionality of mono_delivery_time remains unaltered here,
      instead just extended with userspace tstamp support for bridge
      forwarding path.
      Signed-off-by: default avatarAbhishek Chauhan <quic_abchauha@quicinc.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/r/20240301201348.2815102-1-quic_abchauha@quicinc.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      885c36e5
    • Paolo Abeni's avatar
      Merge branch 'net-gro-cleanups-and-fast-path-refinement' · d35c9659
      Paolo Abeni authored
      Eric Dumazet says:
      
      ====================
      net: gro: cleanups and fast path refinement
      
      Current GRO stack has a 'fast path' for a subset of drivers,
      users of napi_frags_skb().
      
      With TCP zerocopy/direct uses, header split at receive is becoming
      more important, and GRO fast path is disabled.
      
      This series makes GRO (a bit) more efficient for almost all use cases.
      ====================
      
      Link: https://lore.kernel.org/r/20240301193740.3436871-1-edumazet@google.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      d35c9659
    • Eric Dumazet's avatar
      tcp: gro: micro optimizations in tcp[4]_gro_complete() · 8f78010b
      Eric Dumazet authored
      In tcp_gro_complete() :
      
      Moving the skb->inner_transport_header setting
      allows the compiler to reuse the previously loaded value
      of skb->transport_header.
      
      Caching skb_shinfo() avoids duplications as well.
      
      In tcp4_gro_complete(), doing a single change on
      skb_shinfo(skb)->gso_type also generates better code.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      8f78010b
    • Eric Dumazet's avatar
      net: gro: enable fast path for more cases · c7583e9f
      Eric Dumazet authored
      Currently the so-called GRO fast path is only enabled for
      napi_frags_skb() callers.
      
      After the prior patch, we no longer have to clear frag0 whenever
      we pulled bytes to skb->head.
      
      We therefore can initialize frag0 to skb->data so that GRO
      fast path can be used in the following additional cases:
      
      - Drivers using header split (populating skb->data with headers,
        and having payload in one or more page fragments).
      
      - Drivers not using any page frag (entire packet is in skb->data)
      
      Add a likely() in skb_gro_may_pull() to help the compiler
      to generate better code if possible.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      c7583e9f
    • Eric Dumazet's avatar
      net: gro: change skb_gro_network_header() · bd56a29c
      Eric Dumazet authored
      Change skb_gro_network_header() to accept a const sk_buff
      and to no longer check if frag0 is NULL or not.
      
      This allows to remove skb_gro_frag0_invalidate()
      which is seen in profiles when header-split is enabled.
      
      sk_buff parameter is constified for skb_gro_header_fast(),
      inet_gro_compute_pseudo() and ip6_gro_compute_pseudo().
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      bd56a29c
    • Eric Dumazet's avatar
      net: gro: rename skb_gro_header_hard() · 93e16ea0
      Eric Dumazet authored
      skb_gro_header_hard() is renamed to skb_gro_may_pull() to match
      the convention used by common helpers like pskb_may_pull().
      
      This means the condition is inverted:
      
      	if (skb_gro_header_hard(skb, hlen))
      		slow_path();
      
      becomes:
      
      	if (!skb_gro_may_pull(skb, hlen))
      		slow_path();
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      93e16ea0
    • Paolo Abeni's avatar
      Merge branch 'mt7530-dsa-subdriver-improvements-act-iii' · 9452c8b4
      Paolo Abeni authored
       says:
      
      ====================
      MT7530 DSA Subdriver Improvements Act III
      
      This is the third patch series with the goal of simplifying the MT7530 DSA
      subdriver and improving support for MT7530, MT7531, and the switch on the
      MT7988 SoC.
      
      I have done a simple ping test to confirm basic communication on all switch
      ports on MCM and standalone MT7530, and MT7531 switch with this patch
      series applied.
      
      MT7621 Unielec, MCM MT7530:
      
      rgmii-only-gmac0-mt7621-unielec-u7621-06-16m.dtb
      gmac0-and-gmac1-mt7621-unielec-u7621-06-16m.dtb
      
      tftpboot 0x80008000 mips-uzImage.bin; tftpboot 0x83000000 mips-rootfs.cpio.uboot; tftpboot 0x83f00000 $dtb; bootm 0x80008000 0x83000000 0x83f00000
      
      MT7622 Bananapi, MT7531:
      
      gmac0-and-gmac1-mt7622-bananapi-bpi-r64.dtb
      
      tftpboot 0x40000000 arm64-Image; tftpboot 0x45000000 arm64-rootfs.cpio.uboot; tftpboot 0x4a000000 $dtb; booti 0x40000000 0x45000000 0x4a000000
      
      MT7623 Bananapi, standalone MT7530:
      
      rgmii-only-gmac0-mt7623n-bananapi-bpi-r2.dtb
      gmac0-and-gmac1-mt7623n-bananapi-bpi-r2.dtb
      
      tftpboot 0x80008000 arm-zImage; tftpboot 0x83000000 arm-rootfs.cpio.uboot; tftpboot 0x83f00000 $dtb; bootz 0x80008000 0x83000000 0x83f00000
      
      This patch series is the continuation of the patch series linked below.
      
      https://lore.kernel.org/r/20230522121532.86610-1-arinc.unal@arinc9.comSigned-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      ---
      Changes in v3:
      - Patch 8
        - Explain properly the behaviour of setting link down on all ports at
          setup.
        - Split the changes for simplifying the link settings operations out to
          another patch.
      - Link to v2: https://lore.kernel.org/r/20240216-for-netnext-mt7530-improvements-3-v2-0-094cae3ff23b@arinc9.com
      
      Changes in v2:
      - Patch 8
        - Use a single mt7530_rmw() instead of two mt7530_clear() and
          mt7530_set() commands.
      - Link to v1: https://lore.kernel.org/r/20240208-for-netnext-mt7530-improvements-3-v1-0-d7c1cfd502ca@arinc9.com
      ====================
      
      Link: https://lore.kernel.org/r/20240301-for-netnext-mt7530-improvements-3-v3-0-449f4f166454@arinc9.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      9452c8b4
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: simplify link operations · b04097c7
      Arınç ÜNAL authored
      The "MT7621 Giga Switch Programming Guide v0.3", "MT7531 Reference Manual
      for Development Board v1.0", and "MT7988A Wi-Fi 7 Generation Router
      Platform: Datasheet (Open Version) v0.1" documents show that these bits are
      enabled at reset:
      
      PMCR_IFG_XMIT(1) (not part of PMCR_LINK_SETTINGS_MASK)
      PMCR_MAC_MODE (not part of PMCR_LINK_SETTINGS_MASK)
      PMCR_TX_EN
      PMCR_RX_EN
      PMCR_BACKOFF_EN (not part of PMCR_LINK_SETTINGS_MASK)
      PMCR_BACKPR_EN (not part of PMCR_LINK_SETTINGS_MASK)
      PMCR_TX_FC_EN
      PMCR_RX_FC_EN
      
      These bits also don't exist on the MT7530_PMCR_P(6) register of the switch
      on the MT7988 SoC:
      
      PMCR_IFG_XMIT()
      PMCR_MAC_MODE
      PMCR_BACKOFF_EN
      PMCR_BACKPR_EN
      
      Remove the setting of the bits not part of PMCR_LINK_SETTINGS_MASK on
      phylink_mac_config as they're already set.
      
      The bit for setting the port on force mode is already done on
      mt7530_setup() and mt7531_setup_common(). So get rid of
      PMCR_FORCE_MODE_ID() which helped determine which bit to use for the switch
      model.
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b04097c7
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: sort link settings ops and force link down on all ports · 6324230b
      Arınç ÜNAL authored
      port_enable and port_disable clears the link settings. Move that to
      mt7530_setup() and mt7531_setup_common() which set up the switches. This
      way, the link settings are cleared on all ports at setup, and then only
      once with phylink_mac_link_down() when a link goes down.
      
      Enable force mode at setup to apply the force part of the link settings.
      This ensures that disabled ports will have their link down.
      Suggested-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      6324230b
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: put initialising PCS devices code back to original order · 3a87131e
      Arınç ÜNAL authored
      The commit fae46308 ("net: dsa: mt753x: fix pcs conversion regression")
      fixes regression caused by cpu_port_config manually calling phylink
      operations. cpu_port_config was deemed useless and was removed. Therefore,
      put initialising PCS devices code back to its original order.
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      3a87131e
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: get rid of mt753x_mac_config() · 1192ed89
      Arınç ÜNAL authored
      There is no need for a separate function to call
      priv->info->mac_port_config(). Call it from mt753x_phylink_mac_config()
      instead and remove mt753x_mac_config().
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      1192ed89
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: get rid of priv->info->cpu_port_config() · 22fa1017
      Arınç ÜNAL authored
      priv->info->cpu_port_config() is used for MT7531 and the switch on the
      MT7988 SoC. It sets up the ports described as a CPU port earlier than the
      phylink code path would do.
      
      This function is useless as:
      - Configuring the MACs can be done from the phylink_mac_config code path
        instead.
      - All the link configuration it does on the CPU ports are later undone with
        the port_enable, phylink_mac_config, and then phylink_mac_link_up code
        path [1].
      
      priv->p5_interface and priv->p6_interface were being used to prevent
      configuring the MACs from the phylink_mac_config code path. Remove them now
      that they hold no purpose.
      
      Remove priv->info->cpu_port_config(). On mt753x_phylink_mac_config, switch
      to if statements to simplify the code.
      
      Remove the overwriting of the speed and duplex interfaces for certain
      interface modes. Phylink already provides the speed and duplex variables
      with proper values. Phylink already sets the max speed of TRGMII to
      SPEED_1000. Add SPEED_2500 for PHY_INTERFACE_MODE_2500BASEX to where the
      speed and EEE bits are set instead.
      
      On the switch on the MT7988 SoC, PHY_INTERFACE_MODE_INTERNAL is being used
      to describe the interface mode of the 10G MAC, which is of port 6. On
      mt7988_cpu_port_config() PMCR_FORCE_SPEED_1000 was set via the
      PMCR_CPU_PORT_SETTING() mask. Add SPEED_10000 case to where the speed bits
      are set to cover this. No need to add it to where the EEE bits are set as
      the "MT7988A Wi-Fi 7 Generation Router Platform: Datasheet (Open Version)
      v0.1" document shows that these bits don't exist on the MT7530_PMCR_P(6)
      register.
      
      Remove the definition of PMCR_CPU_PORT_SETTING() now that it holds no
      purpose.
      
      Change mt753x_cpu_port_enable() to void now that there're no error cases
      left.
      
      Link: https://lore.kernel.org/netdev/ZHy2jQLesdYFMQtO@shell.armlinux.org.uk/ [1]
      Suggested-by: default avatarRussell King (Oracle) <linux@armlinux.org.uk>
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      22fa1017
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: get rid of useless error returns on phylink code path · adf4ae24
      Arınç ÜNAL authored
      Remove error returns on the cases where they are already handled with the
      function the mac_port_get_caps member in mt753x_table points to.
      
      mt7531_mac_config() is also called from mt7531_cpu_port_config() outside of
      phylink but the port and interface modes are already handled there.
      
      Change the functions and the mac_port_config function pointer to void now
      that there're no error returns anymore.
      
      Remove mt753x_is_mac_port() that used to help the said error returns.
      
      On mt7531_mac_config(), switch to if statements to simplify the code.
      
      Remove internal phy cases from mt753x_phylink_mac_config(), there is no
      need to check the interface mode as that's already handled with the
      function the mac_port_get_caps member in mt753x_table points to.
      Acked-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Tested-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Reviewed-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      adf4ae24
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: do not use SW_PHY_RST to reset MT7531 switch · a565f98d
      Arınç ÜNAL authored
      According to the document MT7531 Reference Manual for Development Board
      v1.0, the SW_PHY_RST bit on the SYS_CTRL register doesn't exist for
      MT7531. This is likely why forcing link down on all ports is necessary for
      MT7531.
      
      Therefore, do not set SW_PHY_RST on mt7531_setup().
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      a565f98d
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: set interrupt register only for MT7530 · 804cd5f7
      Arınç ÜNAL authored
      Setting this register related to interrupts is only needed for the MT7530
      switch. Make an exclusive check to ensure this.
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Acked-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Tested-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      804cd5f7
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: remove .mac_port_config for MT7988 and make it optional · 6ebe414b
      Arınç ÜNAL authored
      For the switch on the MT7988 SoC, the mac_port_config member for ID_MT7988
      in mt753x_table is not needed as the interfaces of all MACs are already
      handled on mt7988_mac_port_get_caps().
      
      Therefore, remove the mac_port_config member from ID_MT7988 in
      mt753x_table. Before calling priv->info->mac_port_config(), if there's no
      mac_port_config member in mt753x_table, exit mt753x_mac_config()
      successfully.
      
      Remove calling priv->info->mac_port_config() from the sanity check as the
      sanity check requires a pointer to a mac_port_config function to be
      non-NULL. This will fail for MT7988 as mac_port_config won't be a member of
      its info table.
      Co-developed-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Signed-off-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      6ebe414b
    • Paolo Abeni's avatar
      Merge branch 'remove-page-frag-implementation-in-vhost_net' · 6702d60d
      Paolo Abeni authored
      Yunsheng Lin says:
      
      ====================
      remove page frag implementation in vhost_net
      
      Currently there are three implementations for page frag:
      
      1. mm/page_alloc.c: net stack seems to be using it in the
         rx part with 'struct page_frag_cache' and the main API
         being page_frag_alloc_align().
      2. net/core/sock.c: net stack seems to be using it in the
         tx part with 'struct page_frag' and the main API being
         skb_page_frag_refill().
      3. drivers/vhost/net.c: vhost seems to be using it to build
         xdp frame, and it's implementation seems to be a mix of
         the above two.
      
      This patchset tries to unfiy the page frag implementation a
      little bit by unifying gfp bit for order 3 page allocation
      and replacing page frag implementation in vhost.c with the
      one in page_alloc.c.
      
      After this patchset, we are not only able to unify the page
      frag implementation a little, but also able to have about
      0.5% performance boost testing by using the vhost_net_test
      introduced in the last patch.
      
      Before this patchset:
      Performance counter stats for './vhost_net_test' (10 runs):
      
           305325.78 msec task-clock                       #    1.738 CPUs utilized               ( +-  0.12% )
             1048668      context-switches                 #    3.435 K/sec                       ( +-  0.00% )
                  11      cpu-migrations                   #    0.036 /sec                        ( +- 17.64% )
                  33      page-faults                      #    0.108 /sec                        ( +-  0.49% )
        244651819491      cycles                           #    0.801 GHz                         ( +-  0.43% )  (64)
         64714638024      stalled-cycles-frontend          #   26.45% frontend cycles idle        ( +-  2.19% )  (67)
         30774313491      stalled-cycles-backend           #   12.58% backend cycles idle         ( +-  7.68% )  (70)
        201749748680      instructions                     #    0.82  insn per cycle
                                                    #    0.32  stalled cycles per insn     ( +-  0.41% )  (66.76%)
         65494787909      branches                         #  214.508 M/sec                       ( +-  0.35% )  (64)
          4284111313      branch-misses                    #    6.54% of all branches             ( +-  0.45% )  (66)
      
             175.699 +- 0.189 seconds time elapsed  ( +-  0.11% )
      
      After this patchset:
      Performance counter stats for './vhost_net_test' (10 runs):
      
           303974.38 msec task-clock                       #    1.739 CPUs utilized               ( +-  0.14% )
             1048807      context-switches                 #    3.450 K/sec                       ( +-  0.00% )
                  14      cpu-migrations                   #    0.046 /sec                        ( +- 12.86% )
                  33      page-faults                      #    0.109 /sec                        ( +-  0.46% )
        251289376347      cycles                           #    0.827 GHz                         ( +-  0.32% )  (60)
         67885175415      stalled-cycles-frontend          #   27.01% frontend cycles idle        ( +-  0.48% )  (63)
         27809282600      stalled-cycles-backend           #   11.07% backend cycles idle         ( +-  0.36% )  (71)
        195543234672      instructions                     #    0.78  insn per cycle
                                                    #    0.35  stalled cycles per insn     ( +-  0.29% )  (69.04%)
         62423183552      branches                         #  205.357 M/sec                       ( +-  0.48% )  (67)
          4135666632      branch-misses                    #    6.63% of all branches             ( +-  0.63% )  (67)
      
             174.764 +- 0.214 seconds time elapsed  ( +-  0.12% )
      
      Changelog:
      V6: Add timeout for poll() and simplify some logic as suggested
          by Jason.
      
      V5: Address the comment from jason in vhost_net_test.c and the
          comment about leaving out the gfp change for page frag in
          sock.c as suggested by Paolo.
      
      V4: Resend based on latest net-next branch.
      
      V3:
      1. Add __page_frag_alloc_align() which is passed with the align mask
         the original function expected as suggested by Alexander.
      2. Drop patch 3 in v2 suggested by Alexander.
      3. Reorder patch 4 & 5 in v2 suggested by Alexander.
      
      Note that placing this gfp flags handing for order 3 page in an inline
      function is not considered, as we may be able to unify the page_frag
      and page_frag_cache handling.
      
      V2: Change 'xor'd' to 'masked off', add vhost tx testing for
          vhost_net_test.
      
      V1: Fix some typo, drop RFC tag and rebase on latest net-next.
      ====================
      
      Link: https://lore.kernel.org/r/20240228093013.8263-1-linyunsheng@huawei.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      6702d60d
    • Yunsheng Lin's avatar
      tools: virtio: introduce vhost_net_test · c5d3705c
      Yunsheng Lin authored
      introduce vhost_net_test for both vhost_net tx and rx basing
      on virtio_test to test vhost_net changing in the kernel.
      
      Steps for vhost_net tx testing:
      1. Prepare a out buf.
      2. Kick the vhost_net to do tx processing.
      3. Do the receiving in the tun side.
      4. verify the data received by tun is correct.
      
      Steps for vhost_net rx testing:
      1. Prepare a in buf.
      2. Do the sending in the tun side.
      3. Kick the vhost_net to do rx processing.
      4. verify the data received by vhost_net is correct.
      Signed-off-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      c5d3705c
    • Yunsheng Lin's avatar
      vhost/net: remove vhost_net_page_frag_refill() · 4051bd81
      Yunsheng Lin authored
      The page frag in vhost_net_page_frag_refill() uses the
      'struct page_frag' from skb_page_frag_refill(), but it's
      implementation is similar to page_frag_alloc_align() now.
      
      This patch removes vhost_net_page_frag_refill() by using
      'struct page_frag_cache' instead of 'struct page_frag',
      and allocating frag using page_frag_alloc_align().
      
      The added benefit is that not only unifying the page frag
      implementation a little, but also having about 0.5% performance
      boost testing by using the vhost_net_test introduced in the
      last patch.
      Signed-off-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      4051bd81
    • Yunsheng Lin's avatar
      net: introduce page_frag_cache_drain() · a0727489
      Yunsheng Lin authored
      When draining a page_frag_cache, most user are doing
      the similar steps, so introduce an API to avoid code
      duplication.
      Signed-off-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      a0727489
    • Yunsheng Lin's avatar
      page_frag: unify gfp bits for order 3 page allocation · 4bc0d63a
      Yunsheng Lin authored
      Currently there seems to be three page frag implementations
      which all try to allocate order 3 page, if that fails, it
      then fail back to allocate order 0 page, and each of them
      all allow order 3 page allocation to fail under certain
      condition by using specific gfp bits.
      
      The gfp bits for order 3 page allocation are different
      between different implementation, __GFP_NOMEMALLOC is
      or'd to forbid access to emergency reserves memory for
      __page_frag_cache_refill(), but it is not or'd in other
      implementions, __GFP_DIRECT_RECLAIM is masked off to avoid
      direct reclaim in vhost_net_page_frag_refill(), but it is
      not masked off in __page_frag_cache_refill().
      
      This patch unifies the gfp bits used between different
      implementions by or'ing __GFP_NOMEMALLOC and masking off
      __GFP_DIRECT_RECLAIM for order 3 page allocation to avoid
      possible pressure for mm.
      
      Leave the gfp unifying for page frag implementation in sock.c
      for now as suggested by Paolo Abeni.
      Signed-off-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Reviewed-by: default avatarAlexander Duyck <alexanderduyck@fb.com>
      CC: Alexander Duyck <alexander.duyck@gmail.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      4bc0d63a
    • Yunsheng Lin's avatar
      mm/page_alloc: modify page_frag_alloc_align() to accept align as an argument · 411c5f36
      Yunsheng Lin authored
      napi_alloc_frag_align() and netdev_alloc_frag_align() accept
      align as an argument, and they are thin wrappers around the
      __napi_alloc_frag_align() and __netdev_alloc_frag_align() APIs
      doing the alignment checking and align mask conversion, in order
      to call page_frag_alloc_align() directly. The intention here is
      to keep the alignment checking and the alignmask conversion in
      in-line wrapper to avoid those kind of operations during execution
      time since it can usually be handled during compile time.
      
      We are going to use page_frag_alloc_align() in vhost_net.c, it
      need the same kind of alignment checking and alignmask conversion,
      so split up page_frag_alloc_align into an inline wrapper doing the
      above operation, and add __page_frag_alloc_align() which is passed
      with the align mask the original function expected as suggested by
      Alexander.
      Signed-off-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      CC: Alexander Duyck <alexander.duyck@gmail.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      411c5f36
    • Jiawen Wu's avatar
      net: txgbe: fix to clear interrupt status after handling IRQ · 0e71862a
      Jiawen Wu authored
      GPIO EOI is not set to clear interrupt status after handling the
      interrupt. It should be done in irq_chip->irq_ack, but this function
      is not called in handle_nested_irq(). So executing function
      txgbe_gpio_irq_ack() manually in txgbe_gpio_irq_handler().
      
      Fixes: aefd0136 ("net: txgbe: use irq_domain for interrupt controller")
      Signed-off-by: default avatarJiawen Wu <jiawenwu@trustnetic.com>
      Link: https://lore.kernel.org/r/20240301092956.18544-2-jiawenwu@trustnetic.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      0e71862a
    • Jiawen Wu's avatar
      net: txgbe: fix GPIO interrupt blocking · b4a2496c
      Jiawen Wu authored
      The register of GPIO interrupt status is masked before MAC IRQ
      is enabled. This is because of hardware deficiency. So manually
      clear the interrupt status before using them. Otherwise, GPIO
      interrupts will never be reported again. There is a workaround for
      clearing interrupts to set GPIO EOI in txgbe_up_complete().
      
      Fixes: aefd0136 ("net: txgbe: use irq_domain for interrupt controller")
      Signed-off-by: default avatarJiawen Wu <jiawenwu@trustnetic.com>
      Link: https://lore.kernel.org/r/20240301092956.18544-1-jiawenwu@trustnetic.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b4a2496c
    • Jakub Kicinski's avatar
      Merge branch 'intel-wired-lan-driver-updates-2024-02-28-ixgbe-igc-igb-e1000e-e100' · b307e25d
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2024-02-28 (ixgbe, igc, igb, e1000e, e100)
      
      This series contains updates to ixgbe, igc, igb, e1000e, and e100
      drivers.
      
      Jon Maxwell makes module parameter values readable in sysfs for ixgbe,
      igb, and e100.
      
      Ernesto Castellotti adds support for 1000BASE-BX on ixgbe.
      
      Arnd Bergmann fixes build failure due to dependency issues for igc.
      
      Vitaly refactors error check to be more concise and prevent future
      issues on e1000e.
      
      v1: https://lore.kernel.org/netdev/20240229004135.741586-1-anthony.l.nguyen@intel.com/
      ====================
      
      Link: https://lore.kernel.org/r/20240301184806.2634508-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b307e25d
    • Vitaly Lifshits's avatar
      e1000e: Minor flow correction in e1000_shutdown function · 662200e3
      Vitaly Lifshits authored
      Add curly braces to avoid entering to an if statement where it is not
      always required in e1000_shutdown function.
      This improves code readability and might prevent non-deterministic
      behaviour in the future.
      Signed-off-by: default avatarVitaly Lifshits <vitaly.lifshits@intel.com>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Link: https://lore.kernel.org/r/20240301184806.2634508-5-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      662200e3
    • Arnd Bergmann's avatar
      igc: fix LEDS_CLASS dependency · 30654f0e
      Arnd Bergmann authored
      When IGC is built-in but LEDS_CLASS is a loadable module, there is
      a link failure:
      
      x86_64-linux-ld: drivers/net/ethernet/intel/igc/igc_leds.o: in function `igc_led_setup':
      igc_leds.c:(.text+0x75c): undefined reference to `devm_led_classdev_register_ext'
      
      Add another dependency that prevents this combination.
      
      Fixes: ea578703 ("igc: Add support for LEDs on i225/i226")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Link: https://lore.kernel.org/r/20240301184806.2634508-4-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      30654f0e
    • Ernesto Castellotti's avatar
      ixgbe: Add 1000BASE-BX support · 1b43e0d2
      Ernesto Castellotti authored
      Added support for 1000BASE-BX, i.e. Gigabit Ethernet over single strand
      of single-mode fiber.
      The initialization of a 1000BASE-BX SFP is the same as 1000BASE-SX/LX
      with the only difference that the Bit Rate Nominal Value must be
      checked to make sure it is a Gigabit Ethernet transceiver, as described
      by the SFF-8472 specification.
      
      This was tested with the FS.com SFP-GE-BX 1310/1490nm 10km transceiver:
      $ ethtool -m eth4
              Identifier                                : 0x03 (SFP)
              Extended identifier                       : 0x04 (GBIC/SFP defined by 2-wire interface ID)
              Connector                                 : 0x07 (LC)
              Transceiver codes                         : 0x00 0x00 0x00 0x40 0x00 0x00 0x00 0x00 0x00
              Transceiver type                          : Ethernet: BASE-BX10
              Encoding                                  : 0x01 (8B/10B)
              BR, Nominal                               : 1300MBd
              Rate identifier                           : 0x00 (unspecified)
              Length (SMF,km)                           : 10km
              Length (SMF)                              : 10000m
              Length (50um)                             : 0m
              Length (62.5um)                           : 0m
              Length (Copper)                           : 0m
              Length (OM3)                              : 0m
              Laser wavelength                          : 1310nm
              Vendor name                               : FS
              Vendor OUI                                : 64:9d:99
              Vendor PN                                 : SFP-GE-BX
              Vendor rev                                :
              Option values                             : 0x20 0x0a
              Option                                    : RX_LOS implemented
              Option                                    : TX_FAULT implemented
              Option                                    : Power level 3 requirement
              BR margin, max                            : 0%
              BR margin, min                            : 0%
              Vendor SN                                 : S2202359108
              Date code                                 : 220307
              Optical diagnostics support               : Yes
              Laser bias current                        : 17.650 mA
              Laser output power                        : 0.2132 mW / -6.71 dBm
              Receiver signal average optical power     : 0.2740 mW / -5.62 dBm
              Module temperature                        : 47.30 degrees C / 117.13 degrees F
              Module voltage                            : 3.2576 V
              Alarm/warning flags implemented           : Yes
              Laser bias current high alarm             : Off
              Laser bias current low alarm              : Off
              Laser bias current high warning           : Off
              Laser bias current low warning            : Off
              Laser output power high alarm             : Off
              Laser output power low alarm              : Off
              Laser output power high warning           : Off
              Laser output power low warning            : Off
              Module temperature high alarm             : Off
              Module temperature low alarm              : Off
              Module temperature high warning           : Off
              Module temperature low warning            : Off
              Module voltage high alarm                 : Off
              Module voltage low alarm                  : Off
              Module voltage high warning               : Off
              Module voltage low warning                : Off
              Laser rx power high alarm                 : Off
              Laser rx power low alarm                  : Off
              Laser rx power high warning               : Off
              Laser rx power low warning                : Off
              Laser bias current high alarm threshold   : 110.000 mA
              Laser bias current low alarm threshold    : 1.000 mA
              Laser bias current high warning threshold : 100.000 mA
              Laser bias current low warning threshold  : 1.000 mA
              Laser output power high alarm threshold   : 0.7079 mW / -1.50 dBm
              Laser output power low alarm threshold    : 0.0891 mW / -10.50 dBm
              Laser output power high warning threshold : 0.6310 mW / -2.00 dBm
              Laser output power low warning threshold  : 0.1000 mW / -10.00 dBm
              Module temperature high alarm threshold   : 90.00 degrees C / 194.00 degrees F
              Module temperature low alarm threshold    : -45.00 degrees C / -49.00 degrees F
              Module temperature high warning threshold : 85.00 degrees C / 185.00 degrees F
              Module temperature low warning threshold  : -40.00 degrees C / -40.00 degrees F
              Module voltage high alarm threshold       : 3.7950 V
              Module voltage low alarm threshold        : 2.8050 V
              Module voltage high warning threshold     : 3.4650 V
              Module voltage low warning threshold      : 3.1350 V
              Laser rx power high alarm threshold       : 0.7079 mW / -1.50 dBm
              Laser rx power low alarm threshold        : 0.0028 mW / -25.53 dBm
              Laser rx power high warning threshold     : 0.6310 mW / -2.00 dBm
              Laser rx power low warning threshold      : 0.0032 mW / -24.95 dBm
      Signed-off-by: default avatarErnesto Castellotti <ernesto@castellotti.net>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Link: https://lore.kernel.org/r/20240301184806.2634508-3-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1b43e0d2
    • Jon Maxwell's avatar
      intel: make module parameters readable in sys filesystem · aa9870f5
      Jon Maxwell authored
      Linux users sometimes need an easy way to check current values of module
      parameters. For example the module may be manually reloaded with different
      parameters. Make these visible and readable in the /sys filesystem to allow
      that. But don't make the "debug" module parameter visible as debugging is
      enabled via ethtool msglvl.
      Signed-off-by: default avatarJon Maxwell <jmaxwell37@gmail.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Link: https://lore.kernel.org/r/20240301184806.2634508-2-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      aa9870f5
    • Eric Dumazet's avatar
      tcp: align tcp_sock_write_rx group · 345a6e26
      Eric Dumazet authored
      Stephen Rothwell and kernel test robot reported that some arches
      (parisc, hexagon) and/or compilers would not like blamed commit.
      
      Lets make sure tcp_sock_write_rx group does not start with a hole.
      
      While we are at it, correct tcp_sock_write_tx CACHELINE_ASSERT_GROUP_SIZE()
      since after the blamed commit, we went to 105 bytes.
      
      Fixes: 99123622 ("tcp: remove some holes in struct tcp_sock")
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Link: https://lore.kernel.org/netdev/20240301121108.5d39e4f9@canb.auug.org.au/
      Closes: https://lore.kernel.org/oe-kbuild-all/202403011451.csPYOS3C-lkp@intel.com/Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Tested-by: Simon Horman <horms@kernel.org> # build-tested
      Link: https://lore.kernel.org/r/20240301171945.2958176-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      345a6e26
    • Pedro Tammela's avatar
      selftests/tc-testing: require an up to date iproute2 for blockcast tests · dcfaf1f7
      Pedro Tammela authored
      Add the dependsOn test check for all the mirred blockcast tests.
      It will prevent the issue reported by LKFT which happens when an older
      iproute2 is used to run the current tdc.
      
      Tests are skipped if the dependsOn check fails.
      Reported-by: default avatarLinux Kernel Functional Testing <lkft@linaro.org>
      Signed-off-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Link: https://lore.kernel.org/r/20240229143825.1373550-1-pctammela@mojatatu.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      dcfaf1f7
    • Prabhav Kumar Vaish's avatar
      selftests: net: Correct couple of spelling mistakes · fb0f0230
      Prabhav Kumar Vaish authored
      Changes :
      	- "excercise" is corrected to "exercise" in drivers/net/mlxsw/spectrum-2/tc_flower.sh
      	- "mutliple" is corrected to "multiple" in drivers/net/netdevsim/ethtool-fec.sh
      Signed-off-by: default avatarPrabhav Kumar Vaish <pvkumar5749404@gmail.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Link: https://lore.kernel.org/r/20240228120701.422264-1-pvkumar5749404@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fb0f0230
  2. 04 Mar, 2024 6 commits