1. 21 Oct, 2021 4 commits
    • Erik Ekman's avatar
      sfc: Export fibre-specific supported link modes · c62041c5
      Erik Ekman authored
      The 1/10GbaseT modes were set up for cards with SFP+ cages in
      3497ed8c ("sfc: report supported link speeds on SFP connections").
      10GbaseT was likely used since no 10G fibre mode existed.
      
      The missing fibre modes for 1/10G were added to ethtool.h in 5711a982
      ("net: ethtool: add support for 1000BaseX and missing 10G link modes")
      shortly thereafter.
      
      The user guide available at https://support-nic.xilinx.com/wp/drivers
      lists support for the following cable and transceiver types in section 2.9:
      - QSFP28 100G Direct Attach Cables
      - QSFP28 100G SR Optical Transceivers (with SR4 modules listed)
      - SFP28 25G Direct Attach Cables
      - SFP28 25G SR Optical Transceivers
      - QSFP+ 40G Direct Attach Cables
      - QSFP+ 40G Active Optical Cables
      - QSFP+ 40G SR4 Optical Transceivers
      - QSFP+ to SFP+ Breakout Direct Attach Cables
      - QSFP+ to SFP+ Breakout Active Optical Cables
      - SFP+ 10G Direct Attach Cables
      - SFP+ 10G SR Optical Transceivers
      - SFP+ 10G LR Optical Transceivers
      - SFP 1000BASE‐T Transceivers
      - 1G Optical Transceivers
      (From user guide issue 28. Issue 16 which also includes older cards like
      SFN5xxx/SFN6xxx has matching lists for 1/10/40G transceiver types.)
      
      Regarding SFP+ 10GBASE‐T transceivers the latest guide says:
      "Solarflare adapters do not support 10GBASE‐T transceiver modules."
      
      Tested using SFN5122F-R7 (with 2 SFP+ ports). Supported link modes do not change
      depending on module used (tested with 1000BASE-T, 1000BASE-BX10, 10GBASE-LR).
      Before:
      
      $ ethtool ext
      Settings for ext:
      	Supported ports: [ FIBRE ]
      	Supported link modes:   1000baseT/Full
      	                        10000baseT/Full
      	Supported pause frame use: Symmetric Receive-only
      	Supports auto-negotiation: No
      	Supported FEC modes: Not reported
      	Advertised link modes:  Not reported
      	Advertised pause frame use: No
      	Advertised auto-negotiation: No
      	Advertised FEC modes: Not reported
      	Link partner advertised link modes:  Not reported
      	Link partner advertised pause frame use: No
      	Link partner advertised auto-negotiation: No
      	Link partner advertised FEC modes: Not reported
      	Speed: 1000Mb/s
      	Duplex: Full
      	Auto-negotiation: off
      	Port: FIBRE
      	PHYAD: 255
      	Transceiver: internal
              Current message level: 0x000020f7 (8439)
                                     drv probe link ifdown ifup rx_err tx_err hw
      	Link detected: yes
      
      After:
      
      $ ethtool ext
      Settings for ext:
      	Supported ports: [ FIBRE ]
      	Supported link modes:   1000baseT/Full
      	                        1000baseX/Full
      	                        10000baseCR/Full
      	                        10000baseSR/Full
      	                        10000baseLR/Full
      	Supported pause frame use: Symmetric Receive-only
      	Supports auto-negotiation: No
      	Supported FEC modes: Not reported
      	Advertised link modes:  Not reported
      	Advertised pause frame use: No
      	Advertised auto-negotiation: No
      	Advertised FEC modes: Not reported
      	Link partner advertised link modes:  Not reported
      	Link partner advertised pause frame use: No
      	Link partner advertised auto-negotiation: No
      	Link partner advertised FEC modes: Not reported
      	Speed: 1000Mb/s
      	Duplex: Full
      	Auto-negotiation: off
      	Port: FIBRE
      	PHYAD: 255
      	Transceiver: internal
      	Supports Wake-on: g
      	Wake-on: d
              Current message level: 0x000020f7 (8439)
                                     drv probe link ifdown ifup rx_err tx_err hw
      	Link detected: yes
      Signed-off-by: default avatarErik Ekman <erik@kryo.se>
      Acked-by: default avatarMartin Habets <habetsm.xilinx@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c62041c5
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 1439caa1
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Crash due to missing initialization of timer data in
         xt_IDLETIMER, from Juhee Kang.
      
      2) NF_CONNTRACK_SECMARK should be bool in Kconfig, from Vegard Nossum.
      
      3) Skip netdev events on netns removal, from Florian Westphal.
      
      4) Add testcase to show port shadowing via UDP, also from Florian.
      
      5) Remove pr_debug() code in ip6t_rt, this fixes a crash due to
         unsafe access to non-linear skbuff, from Xin Long.
      
      6) Make net/ipv4/vs/debug_level read-only from non-init netns,
         from Antoine Tenart.
      
      7) Remove bogus invocation to bash in selftests/netfilter/nft_flowtable.sh
         also from Florian.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1439caa1
    • David S. Miller's avatar
      Merge tag 'mlx5-fixes-2021-10-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · e0bfcf9c
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5-fixes-2021-10-20
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0bfcf9c
    • David S. Miller's avatar
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · a689702a
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2021-10-20
      
      This series contains updates to e1000e, igc, and ice drivers.
      
      Sasha fixes an issue with dropped packets on Tiger Lake platforms for
      e1000e and corrects a device ID for igc.
      
      Tony adds missing E810 device IDs for ice.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a689702a
  2. 20 Oct, 2021 22 commits
    • Emeel Hakim's avatar
      net/mlx5e: IPsec: Fix work queue entry ethernet segment checksum flags · 1d000323
      Emeel Hakim authored
      Current Work Queue Entry (WQE) checksum (csum) flags in the ethernet
      segment (eseg) in case of IPsec crypto offload datapath are not aligned
      with PRM/HW expectations.
      
      Currently the driver always sets the l3_inner_csum flag in case of IPsec
      because of the wrong usage of skb->encapsulation as indicator for inner
      IPsec header since skb->encapsulation is always ON for IPsec packets
      since IPsec itself is an encapsulation protocol. The above forced a
      failing attempts of calculating csum of non-existing segments (like in
      the IP|ESP|TCP packet case which does not have an l3_inner) which led
      to lots of packet drops hence the low throughput.
      
      Fix by using xo->inner_ipproto as indicator for inner IPsec header
      instead of skb->encapsulation in addition to setting the csum flags
      as following:
      * Tunnel Mode:
      * Pkt: MAC  IP     ESP  IP    L4
      * CSUM: l3_cs | l3_inner_cs | l4_inner_cs
      *
      * Transport Mode:
      * Pkt: MAC  IP     ESP  L4
      * CSUM: l3_cs [ | l4_cs (checksum partial case)]
      *
      * Tunnel(VXLAN TCP/UDP) over Transport Mode
      * Pkt: MAC  IP     ESP  UDP  VXLAN  IP    L4
      * CSUM: l3_cs | l3_inner_cs | l4_inner_cs
      
      Fixes: f1267798 ("net/mlx5: Fix checksum issue of VXLAN and IPsec crypto offload")
      Signed-off-by: default avatarEmeel Hakim <ehakim@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      1d000323
    • Emeel Hakim's avatar
      net/mlx5e: IPsec: Fix a misuse of the software parser's fields · d10457f8
      Emeel Hakim authored
      IPsec crypto offload current Software Parser (SWP) fields settings in
      the ethernet segment (eseg) are not aligned with PRM/HW expectations.
      Among others in case of IP|ESP|TCP packet, current driver sets the
      offsets for inner_l3 and inner_l4 although there is no inner l3/l4
      headers relative to ESP header in such packets.
      
      SWP provides the offsets for HW ,so it can be used to find csum fields
      to offload the checksum, however these are not necessarily used by HW
      and are used as fallback in case HW fails to parse the packet, e.g
      when performing IPSec Transport Aware (IP | ESP | TCP) there is no
      need to add SW parse on inner packet. So in some cases packets csum
      was calculated correctly , whereas in other cases it failed. The later
      faced csum errors (caused by wrong packet length calculations) which
      led to lots of packet drops hence the low throughput.
      
      Fix by setting the SWP fields as expected in a IP|ESP|TCP packet.
      
      the following describe the expected SWP offsets:
      * Tunnel Mode:
      * SWP:      OutL3       InL3  InL4
      * Pkt: MAC  IP     ESP  IP    L4
      *
      * Transport Mode:
      * SWP:      OutL3       OutL4
      * Pkt: MAC  IP     ESP  L4
      *
      * Tunnel(VXLAN TCP/UDP) over Transport Mode
      * SWP:      OutL3                   InL3  InL4
      * Pkt: MAC  IP     ESP  UDP  VXLAN  IP    L4
      
      Fixes: f1267798 ("net/mlx5: Fix checksum issue of VXLAN and IPsec crypto offload")
      Signed-off-by: default avatarEmeel Hakim <ehakim@nvidia.com>
      Reviewed-by: default avatarRaed Salem <raeds@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      d10457f8
    • Moshe Shemesh's avatar
      net/mlx5e: Fix vlan data lost during suspend flow · 68e66e1a
      Moshe Shemesh authored
      During suspend flow the driver calls mlx5e_destroy_vlan_table() which
      does not only delete the vlans steering flow rules, but also frees the
      data on currently active vlans, thus it is not restored during resume
      flow.
      
      This fix keeps the vlan data on suspend flow and frees it only on driver
      remove flow.
      
      Fixes: 6783f0a2 ("net/mlx5e: Dynamic alloc vlan table for netdev when needed")
      Signed-off-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      68e66e1a
    • Dmytro Linkin's avatar
      net/mlx5: E-switch, Return correct error code on group creation failure · a6f74333
      Dmytro Linkin authored
      Dan Carpenter report:
      The patch f47e04eb: "net/mlx5: E-switch, Allow setting share/max
      tx rate limits of rate groups" from May 31, 2021, leads to the
      following Smatch static checker warning:
      
      	drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c:483 esw_qos_create_rate_group()
      	warn: passing zero to 'ERR_PTR'
      
      If min rate normalization failed then error code may be overwritten to 0
      if scheduling element destruction succeed. Ignore this value and always
      return initial one.
      
      Fixes: f47e04eb ("net/mlx5: E-switch, Allow setting share/max tx rate limits of rate groups")
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      a6f74333
    • Maor Dickman's avatar
      net/mlx5: Lag, change multipath and bonding to be mutually exclusive · 14fe2471
      Maor Dickman authored
      Both multipath and bonding events are changing the HW LAG state
      independently.
      Handling one of the features events while the other is already
      enabled can cause unwanted behavior, for example handling
      bonding event while multipath enabled will disable the lag and
      cause multipath to stop working.
      
      Fix it by ignoring bonding event while in multipath and ignoring FIB
      events while in bonding mode.
      
      Fixes: 544fe7c2 ("net/mlx5e: Activate HW multipath and handle port affinity based on FIB events")
      Signed-off-by: default avatarMaor Dickman <maord@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarMark Bloch <mbloch@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      14fe2471
    • Tony Nguyen's avatar
      ice: Add missing E810 device ids · 7dcf78b8
      Tony Nguyen authored
      As part of support for E810 XXV devices, some device ids were
      inadvertently left out. Add those missing ids.
      
      Fixes: 195fb977 ("ice: add additional E810 device id")
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Acked-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Acked-by: default avatarPaul Menzel <pmenzel@molgen.mpg.de>
      7dcf78b8
    • Sasha Neftin's avatar
      igc: Update I226_K device ID · 79cc8322
      Sasha Neftin authored
      The device ID for I226_K was incorrectly assigned, update the device
      ID to the correct one.
      
      Fixes: bfa5e98c ("igc: Add new device ID")
      Signed-off-by: default avatarSasha Neftin <sasha.neftin@intel.com>
      Tested-by: default avatarNechama Kraus <nechamax.kraus@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      79cc8322
    • Sasha Neftin's avatar
      e1000e: Fix packet loss on Tiger Lake and later · 639e298f
      Sasha Neftin authored
      Update the HW MAC initialization flow. Do not gate DMA clock from
      the modPHY block. Keeping this clock will prevent dropped packets
      sent in burst mode on the Kumeran interface.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=213651
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=213377
      Fixes: fb776f5d ("e1000e: Add support for Tiger Lake")
      Signed-off-by: default avatarSasha Neftin <sasha.neftin@intel.com>
      Tested-by: default avatarMark Pearson <markpearson@lenovo.com>
      Tested-by: default avatarNechama Kraus <nechamax.kraus@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      639e298f
    • Sasha Neftin's avatar
      e1000e: Separate TGP board type from SPT · 280db5d4
      Sasha Neftin authored
      We have the same LAN controller on different PCHs. Separate TGP board
      type from SPT which will allow for specific fixes to be applied for
      TGP platforms.
      Suggested-by: default avatarKai-Heng Feng <kai.heng.feng@canonical.com>
      Signed-off-by: default avatarSasha Neftin <sasha.neftin@intel.com>
      Reviewed-by: default avatarPaul Menzel <pmenzel@molgen.mpg.de>
      Tested-by: default avatarMark Pearson <markpearson@lenovo.com>
      Tested-by: default avatarNechama Kraus <nechamax.kraus@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      280db5d4
    • Yang Yingliang's avatar
      ptp: Fix possible memory leak in ptp_clock_register() · 4225fea1
      Yang Yingliang authored
      I got memory leak as follows when doing fault injection test:
      
      unreferenced object 0xffff88800906c618 (size 8):
        comm "i2c-idt82p33931", pid 4421, jiffies 4294948083 (age 13.188s)
        hex dump (first 8 bytes):
          70 74 70 30 00 00 00 00                          ptp0....
        backtrace:
          [<00000000312ed458>] __kmalloc_track_caller+0x19f/0x3a0
          [<0000000079f6e2ff>] kvasprintf+0xb5/0x150
          [<0000000026aae54f>] kvasprintf_const+0x60/0x190
          [<00000000f323a5f7>] kobject_set_name_vargs+0x56/0x150
          [<000000004e35abdd>] dev_set_name+0xc0/0x100
          [<00000000f20cfe25>] ptp_clock_register+0x9f4/0xd30 [ptp]
          [<000000008bb9f0de>] idt82p33_probe.cold+0x8b6/0x1561 [ptp_idt82p33]
      
      When posix_clock_register() returns an error, the name allocated
      in dev_set_name() will be leaked, the put_device() should be used
      to give up the device reference, then the name will be freed in
      kobject_cleanup() and other memory will be freed in ptp_clock_release().
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Fixes: a33121e5 ("ptp: fix the race between the release of ptp_clock and cdev")
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4225fea1
    • Kurt Kanzenbach's avatar
      net: stmmac: Fix E2E delay mechanism · 3cb95802
      Kurt Kanzenbach authored
      When utilizing End to End delay mechanism, the following error messages show up:
      
      |root@ehl1:~# ptp4l --tx_timestamp_timeout=50 -H -i eno2 -E -m
      |ptp4l[950.573]: selected /dev/ptp3 as PTP clock
      |ptp4l[950.586]: port 1: INITIALIZING to LISTENING on INIT_COMPLETE
      |ptp4l[950.586]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE
      |ptp4l[952.879]: port 1: new foreign master 001395.fffe.4897b4-1
      |ptp4l[956.879]: selected best master clock 001395.fffe.4897b4
      |ptp4l[956.879]: port 1: assuming the grand master role
      |ptp4l[956.879]: port 1: LISTENING to GRAND_MASTER on RS_GRAND_MASTER
      |ptp4l[962.017]: port 1: received DELAY_REQ without timestamp
      |ptp4l[962.273]: port 1: received DELAY_REQ without timestamp
      |ptp4l[963.090]: port 1: received DELAY_REQ without timestamp
      
      Commit f2fb6b62 ("net: stmmac: enable timestamp snapshot for required PTP
      packets in dwmac v5.10a") already addresses this problem for the dwmac
      v5.10. However, same holds true for all dwmacs above version v4.10. Correct the
      check accordingly. Afterwards everything works as expected.
      
      Tested on Intel Atom(R) x6414RE Processor.
      
      Fixes: 14f34733 ("net: stmmac: Correctly take timestamp for PTPv2")
      Fixes: f2fb6b62 ("net: stmmac: enable timestamp snapshot for required PTP packets in dwmac v5.10a")
      Suggested-by: default avatarOng Boon Leong <boon.leong.ong@intel.com>
      Signed-off-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3cb95802
    • Uwe Kleine-König's avatar
      nfc: st95hf: Make spi remove() callback return zero · 641e3fd1
      Uwe Kleine-König authored
      If something goes wrong in the remove callback, returning an error code
      just results in an error message. The device still disappears.
      
      So don't skip disabling the regulator in st95hf_remove() if resetting
      the controller via spi fails. Also don't return an error code which just
      results in two error messages.
      Signed-off-by: default avatarUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Reviewed-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      641e3fd1
    • David S. Miller's avatar
      Merge branch 'hns3-fixes' · 323e9a95
      David S. Miller authored
      Guangbin Huang says:
      
      ====================
      net: hns3: add some fixes for -net
      
      This series adds some fixes for the HNS3 ethernet driver.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      323e9a95
    • Peng Li's avatar
      net: hns3: disable sriov before unload hclge layer · 0dd8a25f
      Peng Li authored
      HNS3 driver includes hns3.ko, hnae3.ko and hclge.ko.
      hns3.ko includes network stack and pci_driver, hclge.ko includes
      HW device action, algo_ops and timer task, hnae3.ko includes some
      register function.
      
      When SRIOV is enable and hclge.ko is removed, HW device is unloaded
      but VF still exists, PF will not reply VF mbx messages, and cause
      errors.
      
      This patch fix it by disable SRIOV before remove hclge.ko.
      
      Fixes: e2cb1dec ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support")
      Signed-off-by: default avatarPeng Li <lipeng321@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0dd8a25f
    • Yufeng Mo's avatar
      net: hns3: fix vf reset workqueue cannot exit · 1385cc81
      Yufeng Mo authored
      The task of VF reset is performed through the workqueue. It checks the
      value of hdev->reset_pending to determine whether to exit the loop.
      However, the value of hdev->reset_pending may also be assigned by
      the interrupt function hclgevf_misc_irq_handle(), which may cause the
      loop fail to exit and keep occupying the workqueue. This loop is not
      necessary, so remove it and the workqueue will be rescheduled if the
      reset needs to be retried or a new reset occurs.
      
      Fixes: 1cc9bc6e ("net: hns3: split hclgevf_reset() into preparing and rebuilding part")
      Signed-off-by: default avatarYufeng Mo <moyufeng@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1385cc81
    • Yunsheng Lin's avatar
      net: hns3: schedule the polling again when allocation fails · 68752b24
      Yunsheng Lin authored
      Currently when there is a rx page allocation failure, it is
      possible that polling may be stopped if there is no more packet
      to be reveiced, which may cause queue stall problem under memory
      pressure.
      
      This patch makes sure polling is scheduled again when there is
      any rx page allocation failure, and polling will try to allocate
      receive buffers until it succeeds.
      
      Now the allocation retry is added, it is unnecessary to do the rx
      page allocation at the end of rx cleaning, so remove it. And reset
      the unused_count to zero after calling hns3_nic_alloc_rx_buffers()
      to avoid calling hns3_nic_alloc_rx_buffers() repeatedly under
      memory pressure.
      
      Fixes: 76ad4f0e ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
      Signed-off-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      68752b24
    • Yunsheng Lin's avatar
      net: hns3: fix for miscalculation of rx unused desc · 9f9f0f19
      Yunsheng Lin authored
      rx unused desc is the desc that need attatching new buffer
      before refilling to hw to receive new packet, the number of
      desc need attatching new buffer is calculated using next_to_use
      and next_to_clean. when next_to_use == next_to_clean, currently
      hns3 driver assumes that all the desc has the buffer attatched,
      but 'next_to_use == next_to_clean' also means all the desc need
      attatching new buffer if hw has comsumed all the desc and the
      driver has not attatched any buffer to the desc yet.
      
      This patch adds 'refill' in desc_cb to indicate whether a new
      buffer has been refilled to a desc.
      
      Fixes: 76ad4f0e ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
      Signed-off-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9f9f0f19
    • Yunsheng Lin's avatar
      net: hns3: fix the max tx size according to user manual · adfb7b49
      Yunsheng Lin authored
      Currently the max tx size supported by the hw is calculated by
      using the max BD num supported by the hw. According to the hw
      user manual, the max tx size is fixed value for both non-TSO and
      TSO skb.
      
      This patch updates the max tx size according to the manual.
      
      Fixes: 8ae10cfb("net: hns3: support tx-scatter-gather-fraglist feature")
      Signed-off-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      adfb7b49
    • Guangbin Huang's avatar
      net: hns3: add limit ets dwrr bandwidth cannot be 0 · 731797fd
      Guangbin Huang authored
      If ets dwrr bandwidth of tc is set to 0, the hardware will switch to SP
      mode. In this case, this tc may occupy all the tx bandwidth if it has
      huge traffic, so it violates the purpose of the user setting.
      
      To fix this problem, limit the ets dwrr bandwidth must greater than 0.
      
      Fixes: cacde272 ("net: hns3: Add hclge_dcb module for the support of DCB feature")
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      731797fd
    • Guangbin Huang's avatar
      net: hns3: reset DWRR of unused tc to zero · b63fcaab
      Guangbin Huang authored
      Currently, DWRR of tc will be initialized to a fixed value when this tc
      is enabled, but it is not been reset to 0 when this tc is disabled. It
      cause a problem that the DWRR of unused tc is not 0 after using tc tool
      to add and delete multi-tc parameters.
      
      For examples, after enabling 4 TCs and restoring to 1 TC by follow
      tc commands:
      
      $ tc qdisc add dev eth0 root mqprio num_tc 4 map 0 1 2 3 0 1 2 3 queues \
        8@0 8@8 8@16 8@24 hw 1 mode channel
      $ tc qdisc del dev eth0 root
      
      Now there is just one TC is enabled for eth0, but the tc info querying by
      debugfs is shown as follow:
      
      $ cat /mnt/hns3/0000:7d:00.0/tm/tc_sch_info
      enabled tc number: 1
      weight_offset: 14
      TC    MODE  WEIGHT
      0     dwrr    100
      1     dwrr    100
      2     dwrr    100
      3     dwrr    100
      4     dwrr      0
      5     dwrr      0
      6     dwrr      0
      7     dwrr      0
      
      This patch fixes it by resetting DWRR of tc to 0 when tc is disabled.
      
      Fixes: 84844054 ("net: hns3: Add support of TX Scheduler & Shaper to HNS3 driver")
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b63fcaab
    • Jiaran Zhang's avatar
      net: hns3: Add configuration of TM QCN error event · 60484103
      Jiaran Zhang authored
      Add configuration of interrupt type and fifo interrupt enable of TM QCN
      error event if enabled, otherwise this event will not be reported when
      there is error.
      
      Fixes: d914971d ("net: hns3: remove redundant query in hclge_config_tm_hw_err_int()")
      Signed-off-by: default avatarJiaran Zhang <zhangjiaran@huawei.com>
      Signed-off-by: default avatarGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      60484103
    • Eugene Crosser's avatar
      vrf: Revert "Reset skb conntrack connection..." · 55161e67
      Eugene Crosser authored
      This reverts commit 09e856d5.
      
      When an interface is enslaved in a VRF, prerouting conntrack hook is
      called twice: once in the context of the original input interface, and
      once in the context of the VRF interface. If no special precausions are
      taken, this leads to creation of two conntrack entries instead of one,
      and breaks SNAT.
      
      Commit above was intended to avoid creation of extra conntrack entries
      when input interface is enslaved in a VRF. It did so by resetting
      conntrack related data associated with the skb when it enters VRF context.
      
      However it breaks netfilter operation. Imagine a use case when conntrack
      zone must be assigned based on the original input interface, rather than
      VRF interface (that would make original interfaces indistinguishable). One
      could create netfilter rules similar to these:
      
              chain rawprerouting {
                      type filter hook prerouting priority raw;
                      iif realiface1 ct zone set 1 return
                      iif realiface2 ct zone set 2 return
              }
      
      This works before the mentioned commit, but not after: zone assignment
      is "forgotten", and any subsequent NAT or filtering that is dependent
      on the conntrack zone does not work.
      
      Here is a reproducer script that demonstrates the difference in behaviour.
      
      ==========
      #!/bin/sh
      
      # This script demonstrates unexpected change of nftables behaviour
      # caused by commit 09e856d5 ""vrf: Reset skb conntrack
      # connection on VRF rcv"
      # https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=09e856d54bda5f288ef8437a90ab2b9b3eab83d1
      #
      # Before the commit, it was possible to assign conntrack zone to a
      # packet (or mark it for `notracking`) in the prerouting chanin, raw
      # priority, based on the `iif` (interface from which the packet
      # arrived).
      # After the change, # if the interface is enslaved in a VRF, such
      # assignment is lost. Instead, assignment based on the `iif` matching
      # the VRF master interface is honored. Thus it is impossible to
      # distinguish packets based on the original interface.
      #
      # This script demonstrates this change of behaviour: conntrack zone 1
      # or 2 is assigned depending on the match with the original interface
      # or the vrf master interface. It can be observed that conntrack entry
      # appears in different zone in the kernel versions before and after
      # the commit.
      
      IPIN=172.30.30.1
      IPOUT=172.30.30.2
      PFXL=30
      
      ip li sh vein >/dev/null 2>&1 && ip li del vein
      ip li sh tvrf >/dev/null 2>&1 && ip li del tvrf
      nft list table testct >/dev/null 2>&1 && nft delete table testct
      
      ip li add vein type veth peer veout
      ip li add tvrf type vrf table 9876
      ip li set veout master tvrf
      ip li set vein up
      ip li set veout up
      ip li set tvrf up
      /sbin/sysctl -w net.ipv4.conf.veout.accept_local=1
      /sbin/sysctl -w net.ipv4.conf.veout.rp_filter=0
      ip addr add $IPIN/$PFXL dev vein
      ip addr add $IPOUT/$PFXL dev veout
      
      nft -f - <<__END__
      table testct {
      	chain rawpre {
      		type filter hook prerouting priority raw;
      		iif { veout, tvrf } meta nftrace set 1
      		iif veout ct zone set 1 return
      		iif tvrf ct zone set 2 return
      		notrack
      	}
      	chain rawout {
      		type filter hook output priority raw;
      		notrack
      	}
      }
      __END__
      
      uname -rv
      conntrack -F
      ping -W 1 -c 1 -I vein $IPOUT
      conntrack -L
      Signed-off-by: default avatarEugene Crosser <crosser@average.org>
      Acked-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      55161e67
  3. 19 Oct, 2021 5 commits
  4. 18 Oct, 2021 9 commits