1. 23 Aug, 2023 10 commits
  2. 22 Aug, 2023 6 commits
  3. 21 Aug, 2023 5 commits
    • Petr Oros's avatar
      ice: Fix NULL pointer deref during VF reset · 67f6317d
      Petr Oros authored
      During stress test with attaching and detaching VF from KVM and
      simultaneously changing VFs spoofcheck and trust there was a
      NULL pointer dereference in ice_reset_vf that VF's VSI is null.
      
      More than one instance of ice_reset_vf() can be running at a given
      time. When we rebuild the VSI in ice_reset_vf, another reset can be
      triaged from ice_service_task. In this case we can access the currently
      uninitialized VSI and cause panic. The window for this racing condition
      has been around for a long time but it's much worse after commit
      227bf450 ("ice: move VSI delete outside deconfig") because
      the reset runs faster. ice_reset_vf() using vf->cfg_lock and when
      we move this lock before accessing to the VF VSI, we can fix
      BUG for all cases.
      
      Panic occurs sometimes in ice_vsi_is_rx_queue_active() and sometimes
      in ice_vsi_stop_all_rx_rings()
      
      With our reproducer, we can hit BUG:
      ~8h before commit 227bf450 ("ice: move VSI delete outside deconfig").
      ~20m after commit 227bf450 ("ice: move VSI delete outside deconfig").
      After this fix we are not able to reproduce it after ~48h
      
      There was commit cf90b743 ("ice: Fix call trace with null VSI during
      VF reset") which also tried to fix this issue, but it was only
      partially resolved and the bug still exists.
      
      [ 6420.658415] BUG: kernel NULL pointer dereference, address: 0000000000000000
      [ 6420.665382] #PF: supervisor read access in kernel mode
      [ 6420.670521] #PF: error_code(0x0000) - not-present page
      [ 6420.675659] PGD 0
      [ 6420.677679] Oops: 0000 [#1] PREEMPT SMP NOPTI
      [ 6420.682038] CPU: 53 PID: 326472 Comm: kworker/53:0 Kdump: loaded Not tainted 5.14.0-317.el9.x86_64 #1
      [ 6420.691250] Hardware name: Dell Inc. PowerEdge R750/04V528, BIOS 1.6.5 04/15/2022
      [ 6420.698729] Workqueue: ice ice_service_task [ice]
      [ 6420.703462] RIP: 0010:ice_vsi_is_rx_queue_active+0x2d/0x60 [ice]
      [ 6420.705860] ice 0000:ca:00.0: VF 0 is now untrusted
      [ 6420.709494] Code: 00 00 66 83 bf 76 04 00 00 00 48 8b 77 10 74 3e 31 c0 eb 0f 0f b7 97 76 04 00 00 48 83 c0 01 39 c2 7e 2b 48 8b 97 68 04 00 00 <0f> b7 0c 42 48 8b 96 20 13 00 00 48 8d 94 8a 00 00 12 00 8b 12 83
      [ 6420.714426] ice 0000:ca:00.0 ens7f0: Setting MAC 22:22:22:22:22:00 on VF 0. VF driver will be reinitialized
      [ 6420.733120] RSP: 0018:ff778d2ff383fdd8 EFLAGS: 00010246
      [ 6420.733123] RAX: 0000000000000000 RBX: ff2acf1916294000 RCX: 0000000000000000
      [ 6420.733125] RDX: 0000000000000000 RSI: ff2acf1f2c6401a0 RDI: ff2acf1a27301828
      [ 6420.762346] RBP: ff2acf1a27301828 R08: 0000000000000010 R09: 0000000000001000
      [ 6420.769476] R10: ff2acf1916286000 R11: 00000000019eba3f R12: ff2acf19066460d0
      [ 6420.776611] R13: ff2acf1f2c6401a0 R14: ff2acf1f2c6401a0 R15: 00000000ffffffff
      [ 6420.783742] FS:  0000000000000000(0000) GS:ff2acf28ffa80000(0000) knlGS:0000000000000000
      [ 6420.791829] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 6420.797575] CR2: 0000000000000000 CR3: 00000016ad410003 CR4: 0000000000773ee0
      [ 6420.804708] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 6420.811034] vfio-pci 0000:ca:01.0: enabling device (0000 -> 0002)
      [ 6420.811840] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 6420.811841] PKRU: 55555554
      [ 6420.811842] Call Trace:
      [ 6420.811843]  <TASK>
      [ 6420.811844]  ice_reset_vf+0x9a/0x450 [ice]
      [ 6420.811876]  ice_process_vflr_event+0x8f/0xc0 [ice]
      [ 6420.841343]  ice_service_task+0x23b/0x600 [ice]
      [ 6420.845884]  ? __schedule+0x212/0x550
      [ 6420.849550]  process_one_work+0x1e2/0x3b0
      [ 6420.853563]  ? rescuer_thread+0x390/0x390
      [ 6420.857577]  worker_thread+0x50/0x3a0
      [ 6420.861242]  ? rescuer_thread+0x390/0x390
      [ 6420.865253]  kthread+0xdd/0x100
      [ 6420.868400]  ? kthread_complete_and_exit+0x20/0x20
      [ 6420.873194]  ret_from_fork+0x1f/0x30
      [ 6420.876774]  </TASK>
      [ 6420.878967] Modules linked in: vfio_pci vfio_pci_core vfio_iommu_type1 vfio iavf vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter nf_tables bridge stp llc sctp ip6_udp_tunnel udp_tunnel nfp tls nfnetlink bluetooth mlx4_en mlx4_core rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc intel_rapl_msr intel_rapl_common i10nm_edac nfit libnvdimm ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp irdma kvm_intel i40e kvm iTCO_wdt dcdbas ib_uverbs irqbypass iTCO_vendor_support mgag200 mei_me ib_core dell_smbios isst_if_mmio isst_if_mbox_pci rapl i2c_algo_bit drm_shmem_helper intel_cstate drm_kms_helper syscopyarea sysfillrect isst_if_common sysimgblt intel_uncore fb_sys_fops dell_wmi_descriptor wmi_bmof intel_vsec mei i2c_i801 acpi_ipmi ipmi_si i2c_smbus ipmi_devintf intel_pch_thermal acpi_power_meter pcspk
       r
      
      Fixes: efe41860 ("ice: Fix memory corruption in VF driver")
      Fixes: f23df522 ("ice: Fix spurious interrupt during removal of trusted VF")
      Signed-off-by: default avatarPetr Oros <poros@redhat.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      67f6317d
    • Petr Oros's avatar
      Revert "ice: Fix ice VF reset during iavf initialization" · 0ecff05e
      Petr Oros authored
      This reverts commit 7255355a.
      
      After this commit we are not able to attach VF to VM:
      virsh attach-interface v0 hostdev --managed 0000:41:01.0 --mac 52:52:52:52:52:52
      error: Failed to attach interface
      error: Cannot set interface MAC to 52:52:52:52:52:52 for ifname enp65s0f0np0 vf 0: Resource temporarily unavailable
      
      ice_check_vf_ready_for_cfg() already contain waiting for reset.
      New condition in ice_check_vf_ready_for_reset() causing only problems.
      
      Fixes: 7255355a ("ice: Fix ice VF reset during iavf initialization")
      Signed-off-by: default avatarPetr Oros <poros@redhat.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      0ecff05e
    • Jesse Brandeburg's avatar
      ice: fix receive buffer size miscalculation · 10083aef
      Jesse Brandeburg authored
      The driver is misconfiguring the hardware for some values of MTU such that
      it could use multiple descriptors to receive a packet when it could have
      simply used one.
      
      Change the driver to use a round-up instead of the result of a shift, as
      the shift can truncate the lower bits of the size, and result in the
      problem noted above. It also aligns this driver with similar code in i40e.
      
      The insidiousness of this problem is that everything works with the wrong
      size, it's just not working as well as it could, as some MTU sizes end up
      using two or more descriptors, and there is no way to tell that is
      happening without looking at ice_trace or a bus analyzer.
      
      Fixes: efc2214b ("ice: Add support for XDP")
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Signed-off-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      10083aef
    • Ping-Ke Shih's avatar
      wifi: mac80211: limit reorder_buf_filtered to avoid UBSAN warning · b98c1610
      Ping-Ke Shih authored
      The commit 06470f74 ("mac80211: add API to allow filtering frames in BA sessions")
      added reorder_buf_filtered to mark frames filtered by firmware, and it
      can only work correctly if hw.max_rx_aggregation_subframes <= 64 since
      it stores the bitmap in a u64 variable.
      
      However, new HE or EHT devices can support BlockAck number up to 256 or
      1024, and then using a higher subframe index leads UBSAN warning:
      
       UBSAN: shift-out-of-bounds in net/mac80211/rx.c:1129:39
       shift exponent 215 is too large for 64-bit type 'long long unsigned int'
       Call Trace:
        <IRQ>
        dump_stack_lvl+0x48/0x70
        dump_stack+0x10/0x20
        __ubsan_handle_shift_out_of_bounds+0x1ac/0x360
        ieee80211_release_reorder_frame.constprop.0.cold+0x64/0x69 [mac80211]
        ieee80211_sta_reorder_release+0x9c/0x400 [mac80211]
        ieee80211_prepare_and_rx_handle+0x1234/0x1420 [mac80211]
        ieee80211_rx_list+0xaef/0xf60 [mac80211]
        ieee80211_rx_napi+0x53/0xd0 [mac80211]
      
      Since only old hardware that supports <=64 BlockAck uses
      ieee80211_mark_rx_ba_filtered_frames(), limit the use as it is, so add a
      WARN_ONCE() and comment to note to avoid using this function if hardware
      capability is not suitable.
      Signed-off-by: default avatarPing-Ke Shih <pkshih@realtek.com>
      Link: https://lore.kernel.org/r/20230818014004.16177-1-pkshih@realtek.com
      [edit commit message]
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      b98c1610
    • Sabrina Dubroca's avatar
      MAINTAINERS: add entry for macsec · d1cdbf66
      Sabrina Dubroca authored
      Jakub asked if I'd be willing to be the maintainer of the macsec code
      and review the driver code adding macsec offload, so let's add the
      corresponding entry.
      
      The keyword lines are meant to catch selftests and patches adding HW
      offload support to other drivers.
      Suggested-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1cdbf66
  4. 20 Aug, 2023 6 commits
    • Anh Tuan Phan's avatar
      selftests/net: Add log.txt and tools to .gitignore · 144e22e7
      Anh Tuan Phan authored
      Update .gitignore to untrack tools directory and log.txt. "tools" is
      generated in "selftests/net/Makefile" and log.txt is generated in
      "selftests/net/gro.sh" when executing run_all_tests.
      Signed-off-by: default avatarAnh Tuan Phan <tuananhlfc@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      144e22e7
    • Eric Dumazet's avatar
      ipv4: fix data-races around inet->inet_id · f866fbc8
      Eric Dumazet authored
      UDP sendmsg() is lockless, so ip_select_ident_segs()
      can very well be run from multiple cpus [1]
      
      Convert inet->inet_id to an atomic_t, but implement
      a dedicated path for TCP, avoiding cost of a locked
      instruction (atomic_add_return())
      
      Note that this patch will cause a trivial merge conflict
      because we added inet->flags in net-next tree.
      
      v2: added missing change in
      drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c
      (David Ahern)
      
      [1]
      
      BUG: KCSAN: data-race in __ip_make_skb / __ip_make_skb
      
      read-write to 0xffff888145af952a of 2 bytes by task 7803 on cpu 1:
      ip_select_ident_segs include/net/ip.h:542 [inline]
      ip_select_ident include/net/ip.h:556 [inline]
      __ip_make_skb+0x844/0xc70 net/ipv4/ip_output.c:1446
      ip_make_skb+0x233/0x2c0 net/ipv4/ip_output.c:1560
      udp_sendmsg+0x1199/0x1250 net/ipv4/udp.c:1260
      inet_sendmsg+0x63/0x80 net/ipv4/af_inet.c:830
      sock_sendmsg_nosec net/socket.c:725 [inline]
      sock_sendmsg net/socket.c:748 [inline]
      ____sys_sendmsg+0x37c/0x4d0 net/socket.c:2494
      ___sys_sendmsg net/socket.c:2548 [inline]
      __sys_sendmmsg+0x269/0x500 net/socket.c:2634
      __do_sys_sendmmsg net/socket.c:2663 [inline]
      __se_sys_sendmmsg net/socket.c:2660 [inline]
      __x64_sys_sendmmsg+0x57/0x60 net/socket.c:2660
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      read to 0xffff888145af952a of 2 bytes by task 7804 on cpu 0:
      ip_select_ident_segs include/net/ip.h:541 [inline]
      ip_select_ident include/net/ip.h:556 [inline]
      __ip_make_skb+0x817/0xc70 net/ipv4/ip_output.c:1446
      ip_make_skb+0x233/0x2c0 net/ipv4/ip_output.c:1560
      udp_sendmsg+0x1199/0x1250 net/ipv4/udp.c:1260
      inet_sendmsg+0x63/0x80 net/ipv4/af_inet.c:830
      sock_sendmsg_nosec net/socket.c:725 [inline]
      sock_sendmsg net/socket.c:748 [inline]
      ____sys_sendmsg+0x37c/0x4d0 net/socket.c:2494
      ___sys_sendmsg net/socket.c:2548 [inline]
      __sys_sendmmsg+0x269/0x500 net/socket.c:2634
      __do_sys_sendmmsg net/socket.c:2663 [inline]
      __se_sys_sendmmsg net/socket.c:2660 [inline]
      __x64_sys_sendmmsg+0x57/0x60 net/socket.c:2660
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      value changed: 0x184d -> 0x184e
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 7804 Comm: syz-executor.1 Not tainted 6.5.0-rc6-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
      ==================================================================
      
      Fixes: 23f57406 ("ipv4: avoid using shared IP generator for connected sockets")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f866fbc8
    • Jakub Kicinski's avatar
      net: validate veth and vxcan peer ifindexes · f534f658
      Jakub Kicinski authored
      veth and vxcan need to make sure the ifindexes of the peer
      are not negative, core does not validate this.
      
      Using iproute2 with user-space-level checking removed:
      
      Before:
      
        # ./ip link add index 10 type veth peer index -1
        # ip link show
        1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
          link/ether 52:54:00:74:b2:03 brd ff:ff:ff:ff:ff:ff
        10: veth1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
          link/ether 8a:90:ff:57:6d:5d brd ff:ff:ff:ff:ff:ff
        -1: veth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
          link/ether ae:ed:18:e6:fa:7f brd ff:ff:ff:ff:ff:ff
      
      Now:
      
        $ ./ip link add index 10 type veth peer index -1
        Error: ifindex can't be negative.
      
      This problem surfaced in net-next because an explicit WARN()
      was added, the root cause is older.
      
      Fixes: e6f8f1a7 ("veth: Allow to create peer link with given ifindex")
      Fixes: a8f820a3 ("can: add Virtual CAN Tunnel driver (vxcan)")
      Reported-by: syzbot+5ba06978f34abb058571@syzkaller.appspotmail.com
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f534f658
    • David S. Miller's avatar
      Merge branch 'fixed_phy_register-return-value' · c727c6f7
      David S. Miller authored
      Ruan Jinjie says:
      
      ====================
      net: Fix return value check for fixed_phy_register()
      
      The fixed_phy_register() function returns error pointers and never
      returns NULL. Update the checks accordingly.
      
      Changes in v3:
      - Drop the error fix patch for fixed_phy_get_gpiod().
      - Split the error code update code into another patch set as suggested.
      - Update the commit title and message.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c727c6f7
    • Ruan Jinjie's avatar
      net: bcmgenet: Fix return value check for fixed_phy_register() · 32bbe64a
      Ruan Jinjie authored
      The fixed_phy_register() function returns error pointers and never
      returns NULL. Update the checks accordingly.
      
      Fixes: b0ba512e ("net: bcmgenet: enable driver to work without a device tree")
      Signed-off-by: default avatarRuan Jinjie <ruanjinjie@huawei.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Acked-by: default avatarDoug Berger <opendmb@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      32bbe64a
    • Ruan Jinjie's avatar
      net: bgmac: Fix return value check for fixed_phy_register() · 23a14488
      Ruan Jinjie authored
      The fixed_phy_register() function returns error pointers and never
      returns NULL. Update the checks accordingly.
      
      Fixes: c25b23b8 ("bgmac: register fixed PHY for ARM BCM470X / BCM5301X chipsets")
      Signed-off-by: default avatarRuan Jinjie <ruanjinjie@huawei.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      23a14488
  5. 19 Aug, 2023 13 commits
    • Serge Semin's avatar
      net: phy: Fix deadlocking in phy_error() invocation · a0e026e7
      Serge Semin authored
      Since commit 91a7cda1 ("net: phy: Fix race condition on link status
      change") all the phy_error() method invocations have been causing the
      nested-mutex-lock deadlock because it's normally done in the PHY-driver
      threaded IRQ handlers which since that change have been called with the
      phydev->lock mutex held. Here is the calls thread:
      
      IRQ: phy_interrupt()
           +-> mutex_lock(&phydev->lock); <--------------------+
               drv->handle_interrupt()                         | Deadlock due
               +-> ERROR: phy_error()                          + to the nested
                          +-> phy_process_error()              | mutex lock
                              +-> mutex_lock(&phydev->lock); <-+
                                  phydev->state = PHY_ERROR;
                                  mutex_unlock(&phydev->lock);
               mutex_unlock(&phydev->lock);
      
      The problem can be easily reproduced just by calling phy_error() from any
      PHY-device threaded interrupt handler. Fix it by dropping the phydev->lock
      mutex lock from the phy_process_error() method and printing a nasty error
      message to the system log if the mutex isn't held in the caller execution
      context.
      
      Note for the fix to work correctly in the PHY-subsystem itself the
      phydev->lock mutex locking must be added to the phy_error_precise()
      function.
      
      Link: https://lore.kernel.org/netdev/20230816180944.19262-1-fancer.lancer@gmail.com
      Fixes: 91a7cda1 ("net: phy: Fix race condition on link status change")
      Suggested-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarSerge Semin <fancer.lancer@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a0e026e7
    • Josua Mayer's avatar
      net: sfp: handle 100G/25G active optical cables in sfp_parse_support · db1a6ad7
      Josua Mayer authored
      Handle extended compliance code 0x1 (SFF8024_ECC_100G_25GAUI_C2M_AOC)
      for active optical cables supporting 25G and 100G speeds.
      
      Since the specification makes no statement about transmitter range, and
      as the specific sfp module that had been tested features only 2m fiber -
      short-range (SR) modes are selected.
      
      The 100G speed is irrelevant because it would require multiple fibers /
      multiple SFP28 modules combined under one netdev.
      sfp-bus.c only handles a single module per netdev, so only 25Gbps modes
      are selected.
      
      sfp_parse_support already handles SFF8024_ECC_100GBASE_SR4_25GBASE_SR
      with compatible properties, however that entry is a contradiction in
      itself since with SFP(28) 100GBASE_SR4 is impossible - that would likely
      be a mode for qsfp modules only.
      
      Add a case for SFF8024_ECC_100G_25GAUI_C2M_AOC selecting 25gbase-r
      interface mode and 25000baseSR link mode.
      Also enforce SFP28 bitrate limits on the values read from sfp eeprom as
      requested by Russell King.
      
      Tested with fs.com S28-AO02 AOC SFP28 module.
      Signed-off-by: default avatarJosua Mayer <josua@solid-run.com>
      Reviewed-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      db1a6ad7
    • Serge Semin's avatar
      net: mdio: mdio-bitbang: Fix C45 read/write protocol · 2572ce62
      Serge Semin authored
      Based on the original code semantic in case of Clause 45 MDIO, the address
      command is supposed to be followed by the command sending the MMD address,
      not the CSR address. The commit 002dd3de ("net: mdio: mdio-bitbang:
      Separate C22 and C45 transactions") has erroneously broken that. So most
      likely due to an unfortunate variable name it switched the code to sending
      the CSR address. In our case it caused the protocol malfunction so the
      read operation always failed with the turnaround bit always been driven to
      one by PHY instead of zero. Fix that by getting back the correct
      behaviour: sending MMD address command right after the regular address
      command.
      
      Fixes: 002dd3de ("net: mdio: mdio-bitbang: Separate C22 and C45 transactions")
      Signed-off-by: default avatarSerge Semin <fancer.lancer@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2572ce62
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: fix handling of 802.1X PAE frames · e94b590a
      Arınç ÜNAL authored
      802.1X PAE frames are link-local frames, therefore they must be trapped to
      the CPU port. Currently, the MT753X switches treat 802.1X PAE frames as
      regular multicast frames, therefore flooding them to user ports. To fix
      this, set 802.1X PAE frames to be trapped to the CPU port(s).
      
      Fixes: b8f126a8 ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e94b590a
    • Jakub Kicinski's avatar
      Merge branch 'mlxsw-fixes-for-spectrum-4' · cfceccca
      Jakub Kicinski authored
      Petr Machata says:
      
      ====================
      mlxsw: Fixes for Spectrum-4
      
      This patchset contains an assortment of fixes for mlxsw Spectrum-4 support.
      ====================
      
      Link: https://lore.kernel.org/r/cover.1692268427.git.petrm@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cfceccca
    • Ido Schimmel's avatar
      selftests: mlxsw: Fix test failure on Spectrum-4 · f520489e
      Ido Schimmel authored
      Remove assumptions about shared buffer cell size and instead query the
      cell size from devlink. Adjust the test to send small packets that fit
      inside a single cell.
      
      Tested on Spectrum-{1,2,3,4}.
      
      Fixes: 47354021 ("mlxsw: spectrum: Extend to support Spectrum-4 ASIC")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/f7dfbf3c4d1cb23838d9eb99bab09afaa320c4ca.1692268427.git.petrm@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f520489e
    • Amit Cohen's avatar
      mlxsw: Fix the size of 'VIRT_ROUTER_MSB' · 348c976b
      Amit Cohen authored
      The field 'virtual router' was extended to 12 bits in Spectrum-4.
      Therefore, the element 'MLXSW_AFK_ELEMENT_VIRT_ROUTER_MSB' needs 3 bits for
      Spectrum < 4 and 4 bits for Spectrum >= 4.
      
      The elements are stored in an internal storage scratchpad. Currently, the
      MSB is defined there as 3 bits. It means that for Spectrum-4, only 2K VRFs
      can be used for multicast routing, as the highest bit is not really used by
      the driver. Fix the definition of 'VIRT_ROUTER_MSB' to use 4 bits. Adjust
      the definitions of 'virtual router' field in the blocks accordingly - use
      '_avoid_size_check' for Spectrum-2 instead of for Spectrum-4. Fix the mask
      in parse function to use 4 bits.
      
      Fixes: 6d5d8ebb ("mlxsw: Rename virtual router flex key element")
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/79bed2b70f6b9ed58d4df02e9798a23da648015b.1692268427.git.petrm@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      348c976b
    • Ido Schimmel's avatar
      mlxsw: reg: Fix SSPR register layout · 0dc63b9c
      Ido Schimmel authored
      The two most significant bits of the "local_port" field in the SSPR
      register are always cleared since they are overwritten by the deprecated
      and overlapping "sub_port" field.
      
      On systems with more than 255 local ports (e.g., Spectrum-4), this
      results in the firmware maintaining invalid mappings between system port
      and local port. Specifically, two different systems ports (0x1 and
      0x101) point to the same local port (0x1), which eventually leads to
      firmware errors.
      
      Fix by removing the deprecated "sub_port" field.
      
      Fixes: fd24b29a ("mlxsw: reg: Align existing registers to use extended local_port field")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/9b909a3033c8d3d6f67f237306bef4411c5e6ae4.1692268427.git.petrm@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0dc63b9c
    • Danielle Ratson's avatar
      mlxsw: pci: Set time stamp fields also when its type is MIRROR_UTC · bc2de151
      Danielle Ratson authored
      Currently, in Spectrum-2 and above, time stamps are extracted from the CQE
      into the time stamp fields in 'struct mlxsw_skb_cb', only when the CQE
      time stamp type is UTC. The time stamps are read directly from the CQE and
      software can get the time stamp in UTC format using CQEv2.
      
      From Spectrum-4, the time stamps that are read from the CQE are allowed
      to be also from MIRROR_UTC type.
      
      Therefore, we get a warning [1] from the driver that the time stamp fields
      were not set, when LLDP control packet is sent.
      
      Allow the time stamp type to be MIRROR_UTC and set the time stamp in this
      case as well.
      
      [1]
       WARNING: CPU: 11 PID: 0 at drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c:1409 mlxsw_sp2_ptp_hwtstamp_fill+0x1f/0x70 [mlxsw_spectrum]
      [...]
       Call Trace:
        <IRQ>
        mlxsw_sp2_ptp_receive+0x3c/0x80 [mlxsw_spectrum]
        mlxsw_core_skb_receive+0x119/0x190 [mlxsw_core]
        mlxsw_pci_cq_tasklet+0x3c9/0x780 [mlxsw_pci]
        tasklet_action_common.constprop.0+0x9f/0x110
        __do_softirq+0xbb/0x296
        irq_exit_rcu+0x79/0xa0
        common_interrupt+0x86/0xa0
        </IRQ>
        <TASK>
      
      Fixes: 47354021 ("mlxsw: spectrum: Extend to support Spectrum-4 ASIC")
      Signed-off-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/bcef4d044ef608a4e258d33a7ec0ecd91f480db5.1692268427.git.petrm@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bc2de151
    • Lu Wei's avatar
      ipvlan: Fix a reference count leak warning in ipvlan_ns_exit() · 043d5f68
      Lu Wei authored
      There are two network devices(veth1 and veth3) in ns1, and ipvlan1 with
      L3S mode and ipvlan2 with L2 mode are created based on them as
      figure (1). In this case, ipvlan_register_nf_hook() will be called to
      register nf hook which is needed by ipvlans in L3S mode in ns1 and value
      of ipvl_nf_hook_refcnt is set to 1.
      
      (1)
                 ns1                           ns2
            ------------                  ------------
      
         veth1--ipvlan1 (L3S)
      
         veth3--ipvlan2 (L2)
      
      (2)
                 ns1                           ns2
            ------------                  ------------
      
         veth1--ipvlan1 (L3S)
      
               ipvlan2 (L2)                  veth3
           |                                  |
           |------->-------->--------->--------
                          migrate
      
      When veth3 migrates from ns1 to ns2 as figure (2), veth3 will register in
      ns2 and calls call_netdevice_notifiers with NETDEV_REGISTER event:
      
      dev_change_net_namespace
          call_netdevice_notifiers
              ipvlan_device_event
                  ipvlan_migrate_l3s_hook
                      ipvlan_register_nf_hook(newnet)      (I)
                      ipvlan_unregister_nf_hook(oldnet)    (II)
      
      In function ipvlan_migrate_l3s_hook(), ipvl_nf_hook_refcnt in ns1 is not 0
      since veth1 with ipvlan1 still in ns1, (I) and (II) will be called to
      register nf_hook in ns2 and unregister nf_hook in ns1. As a result,
      ipvl_nf_hook_refcnt in ns1 is decreased incorrectly and this in ns2
      is increased incorrectly. When the second net namespace is removed, a
      reference count leak warning in ipvlan_ns_exit() will be triggered.
      
      This patch add a check before ipvlan_migrate_l3s_hook() is called. The
      warning can be triggered as follows:
      
      $ ip netns add ns1
      $ ip netns add ns2
      $ ip netns exec ns1 ip link add veth1 type veth peer name veth2
      $ ip netns exec ns1 ip link add veth3 type veth peer name veth4
      $ ip netns exec ns1 ip link add ipv1 link veth1 type ipvlan mode l3s
      $ ip netns exec ns1 ip link add ipv2 link veth3 type ipvlan mode l2
      $ ip netns exec ns1 ip link set veth3 netns ns2
      $ ip net del ns2
      
      Fixes: 3133822f ("ipvlan: use pernet operations and restrict l3s hooks to master netns")
      Signed-off-by: default avatarLu Wei <luwei32@huawei.com>
      Reviewed-by: default avatarFlorian Westphal <fw@strlen.de>
      Link: https://lore.kernel.org/r/20230817145449.141827-1-luwei32@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      043d5f68
    • Eric Dumazet's avatar
      dccp: annotate data-races in dccp_poll() · cba3f178
      Eric Dumazet authored
      We changed tcp_poll() over time, bug never updated dccp.
      
      Note that we also could remove dccp instead of maintaining it.
      
      Fixes: 7c657876 ("[DCCP]: Initial implementation")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20230818015820.2701595-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cba3f178
    • Eric Dumazet's avatar
      sock: annotate data-races around prot->memory_pressure · 76f33296
      Eric Dumazet authored
      *prot->memory_pressure is read/writen locklessly, we need
      to add proper annotations.
      
      A recent commit added a new race, it is time to audit all accesses.
      
      Fixes: 2d0c88e8 ("sock: Fix misuse of sk_under_memory_pressure()")
      Fixes: 4d93df0a ("[SCTP]: Rewrite of sctp buffer management code")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Abel Wu <wuyun.abel@bytedance.com>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Link: https://lore.kernel.org/r/20230818015132.2699348-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      76f33296
    • Vladimir Oltean's avatar
      net: dsa: felix: fix oversize frame dropping for always closed tc-taprio gates · d44036ca
      Vladimir Oltean authored
      The blamed commit resolved a bug where frames would still get stuck at
      egress, even though they're smaller than the maxSDU[tc], because the
      driver did not take into account the extra 33 ns that the queue system
      needs for scheduling the frame.
      
      It now takes that into account, but the arithmetic that we perform in
      vsc9959_tas_remaining_gate_len_ps() is buggy, because we operate on
      64-bit unsigned integers, so gate_len_ns - VSC9959_TAS_MIN_GATE_LEN_NS
      may become a very large integer if gate_len_ns < 33 ns.
      
      In practice, this means that we've introduced a regression where all
      traffic class gates which are permanently closed will not get detected
      by the driver, and we won't enable oversize frame dropping for them.
      
      Before:
      mscc_felix 0000:00:00.5: port 0: max frame size 1526 needs 12400000 ps, 1152000 ps for mPackets at speed 1000
      mscc_felix 0000:00:00.5: port 0 tc 0 min gate len 1000000, sending all frames
      mscc_felix 0000:00:00.5: port 0 tc 1 min gate len 0, sending all frames
      mscc_felix 0000:00:00.5: port 0 tc 2 min gate len 0, sending all frames
      mscc_felix 0000:00:00.5: port 0 tc 3 min gate len 0, sending all frames
      mscc_felix 0000:00:00.5: port 0 tc 4 min gate len 0, sending all frames
      mscc_felix 0000:00:00.5: port 0 tc 5 min gate len 0, sending all frames
      mscc_felix 0000:00:00.5: port 0 tc 6 min gate len 0, sending all frames
      mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 5120 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 615 octets including FCS
      
      After:
      mscc_felix 0000:00:00.5: port 0: max frame size 1526 needs 12400000 ps, 1152000 ps for mPackets at speed 1000
      mscc_felix 0000:00:00.5: port 0 tc 0 min gate len 1000000, sending all frames
      mscc_felix 0000:00:00.5: port 0 tc 1 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
      mscc_felix 0000:00:00.5: port 0 tc 2 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
      mscc_felix 0000:00:00.5: port 0 tc 3 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
      mscc_felix 0000:00:00.5: port 0 tc 4 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
      mscc_felix 0000:00:00.5: port 0 tc 5 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
      mscc_felix 0000:00:00.5: port 0 tc 6 min gate length 0 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 1 octets including FCS
      mscc_felix 0000:00:00.5: port 0 tc 7 min gate length 5120 ns not enough for max frame size 1526 at 1000 Mbps, dropping frames over 615 octets including FCS
      
      Fixes: 11afdc65 ("net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20230817120111.3522827-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d44036ca