1. 06 Jun, 2024 25 commits
  2. 05 Jun, 2024 15 commits
    • Karol Kolacinski's avatar
      ptp: Fix error message on failed pin verification · 323a359f
      Karol Kolacinski authored
      On failed verification of PTP clock pin, error message prints channel
      number instead of pin index after "pin", which is incorrect.
      
      Fix error message by adding channel number to the message and printing
      pin number instead of channel number.
      
      Fixes: 6092315d ("ptp: introduce programmable pins.")
      Signed-off-by: default avatarKarol Kolacinski <karol.kolacinski@intel.com>
      Acked-by: default avatarRichard Cochran <richardcochran@gmail.com>
      Link: https://lore.kernel.org/r/20240604120555.16643-1-karol.kolacinski@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      323a359f
    • Eric Dumazet's avatar
      net/sched: taprio: always validate TCA_TAPRIO_ATTR_PRIOMAP · f921a58a
      Eric Dumazet authored
      If one TCA_TAPRIO_ATTR_PRIOMAP attribute has been provided,
      taprio_parse_mqprio_opt() must validate it, or userspace
      can inject arbitrary data to the kernel, the second time
      taprio_change() is called.
      
      First call (with valid attributes) sets dev->num_tc
      to a non zero value.
      
      Second call (with arbitrary mqprio attributes)
      returns early from taprio_parse_mqprio_opt()
      and bad things can happen.
      
      Fixes: a3d43c0d ("taprio: Add support adding an admin schedule")
      Reported-by: default avatarNoam Rathaus <noamr@ssd-disclosure.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20240604181511.769870-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f921a58a
    • Aleksandr Mishin's avatar
      net/mlx5: Fix tainted pointer delete is case of flow rules creation fail · 229bedbf
      Aleksandr Mishin authored
      In case of flow rule creation fail in mlx5_lag_create_port_sel_table(),
      instead of previously created rules, the tainted pointer is deleted
      deveral times.
      Fix this bug by using correct flow rules pointers.
      
      Found by Linux Verification Center (linuxtesting.org) with SVACE.
      
      Fixes: 352899f3 ("net/mlx5: Lag, use buckets in hash mode")
      Signed-off-by: default avatarAleksandr Mishin <amishin@t-argos.ru>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Link: https://lore.kernel.org/r/20240604100552.25201-1-amishin@t-argos.ruSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      229bedbf
    • David S. Miller's avatar
      Merge branch 'mlx5-fixes' · f8f0de9d
      David S. Miller authored
      Tariq Toukan says:
      
      ====================
      mlx5 core fixes 20240603
      
      This small patchset provides two bug fixes from the team to the mlx5 core driver.
      
      Series generated against:
      commit 33700a0c ("net/tcp: Don't consider TCP_CLOSE in TCP_AO_ESTABLISHED")
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f8f0de9d
    • Shay Drory's avatar
      net/mlx5: Always stop health timer during driver removal · c8b3f38d
      Shay Drory authored
      Currently, if teardown_hca fails to execute during driver removal, mlx5
      does not stop the health timer. Afterwards, mlx5 continue with driver
      teardown. This may lead to a UAF bug, which results in page fault
      Oops[1], since the health timer invokes after resources were freed.
      
      Hence, stop the health monitor even if teardown_hca fails.
      
      [1]
      mlx5_core 0000:18:00.0: E-Switch: Unload vfs: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
      mlx5_core 0000:18:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
      mlx5_core 0000:18:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
      mlx5_core 0000:18:00.0: E-Switch: cleanup
      mlx5_core 0000:18:00.0: wait_func:1155:(pid 1967079): TEARDOWN_HCA(0x103) timeout. Will cause a leak of a command resource
      mlx5_core 0000:18:00.0: mlx5_function_close:1288:(pid 1967079): tear_down_hca failed, skip cleanup
      BUG: unable to handle page fault for address: ffffa26487064230
      PGD 100c00067 P4D 100c00067 PUD 100e5a067 PMD 105ed7067 PTE 0
      Oops: 0000 [#1] PREEMPT SMP PTI
      CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           OE     -------  ---  6.7.0-68.fc38.x86_64 #1
      Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0013.121520200651 12/15/2020
      RIP: 0010:ioread32be+0x34/0x60
      RSP: 0018:ffffa26480003e58 EFLAGS: 00010292
      RAX: ffffa26487064200 RBX: ffff9042d08161a0 RCX: ffff904c108222c0
      RDX: 000000010bbf1b80 RSI: ffffffffc055ddb0 RDI: ffffa26487064230
      RBP: ffff9042d08161a0 R08: 0000000000000022 R09: ffff904c108222e8
      R10: 0000000000000004 R11: 0000000000000441 R12: ffffffffc055ddb0
      R13: ffffa26487064200 R14: ffffa26480003f00 R15: ffff904c108222c0
      FS:  0000000000000000(0000) GS:ffff904c10800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffa26487064230 CR3: 00000002c4420006 CR4: 00000000007706f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       <IRQ>
       ? __die+0x23/0x70
       ? page_fault_oops+0x171/0x4e0
       ? exc_page_fault+0x175/0x180
       ? asm_exc_page_fault+0x26/0x30
       ? __pfx_poll_health+0x10/0x10 [mlx5_core]
       ? __pfx_poll_health+0x10/0x10 [mlx5_core]
       ? ioread32be+0x34/0x60
       mlx5_health_check_fatal_sensors+0x20/0x100 [mlx5_core]
       ? __pfx_poll_health+0x10/0x10 [mlx5_core]
       poll_health+0x42/0x230 [mlx5_core]
       ? __next_timer_interrupt+0xbc/0x110
       ? __pfx_poll_health+0x10/0x10 [mlx5_core]
       call_timer_fn+0x21/0x130
       ? __pfx_poll_health+0x10/0x10 [mlx5_core]
       __run_timers+0x222/0x2c0
       run_timer_softirq+0x1d/0x40
       __do_softirq+0xc9/0x2c8
       __irq_exit_rcu+0xa6/0xc0
       sysvec_apic_timer_interrupt+0x72/0x90
       </IRQ>
       <TASK>
       asm_sysvec_apic_timer_interrupt+0x1a/0x20
      RIP: 0010:cpuidle_enter_state+0xcc/0x440
       ? cpuidle_enter_state+0xbd/0x440
       cpuidle_enter+0x2d/0x40
       do_idle+0x20d/0x270
       cpu_startup_entry+0x2a/0x30
       rest_init+0xd0/0xd0
       arch_call_rest_init+0xe/0x30
       start_kernel+0x709/0xa90
       x86_64_start_reservations+0x18/0x30
       x86_64_start_kernel+0x96/0xa0
       secondary_startup_64_no_verify+0x18f/0x19b
      ---[ end trace 0000000000000000 ]---
      
      Fixes: 9b98d395 ("net/mlx5: Start health poll at earlier stage of driver load")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c8b3f38d
    • Moshe Shemesh's avatar
      net/mlx5: Stop waiting for PCI if pci channel is offline · 33afbfcc
      Moshe Shemesh authored
      In case pci channel becomes offline the driver should not wait for PCI
      reads during health dump and recovery flow. The driver has timeout for
      each of these loops trying to read PCI, so it would fail anyway.
      However, in case of recovery waiting till timeout may cause the pci
      error_detected() callback fail to meet pci_dpc_recovered() wait timeout.
      
      Fixes: b3bd076f ("net/mlx5: Report devlink health on FW fatal issues")
      Signed-off-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarShay Drori <shayd@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      33afbfcc
    • Frank Wunderlich's avatar
      net: ethernet: mtk_eth_soc: handle dma buffer size soc specific · c57e5581
      Frank Wunderlich authored
      The mainline MTK ethernet driver suffers long time from rarly but
      annoying tx queue timeouts. We think that this is caused by fixed
      dma sizes hardcoded for all SoCs.
      
      We suspect this problem arises from a low level of free TX DMADs,
      the TX Ring alomost full.
      
      The transmit timeout is caused by the Tx queue not waking up. The
      Tx queue stops when the free counter is less than ring->thres, and
      it will wake up once the free counter is greater than ring->thres.
      If the CPU is too late to wake up the Tx queues, it may cause a
      transmit timeout.
      Therefore, we increased the TX and RX DMADs to improve this error
      situation.
      
      Use the dma-size implementation from SDK in a per SoC manner. In
      difference to SDK we have no RSS feature yet, so all RX/TX sizes
      should be raised from 512 to 2048 byte except fqdma on mt7988 to
      avoid the tx timeout issue.
      
      Fixes: 656e7052 ("net-next: mediatek: add support for MT7623 ethernet")
      Suggested-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Signed-off-by: default avatarFrank Wunderlich <frank-w@public-files.de>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c57e5581
    • Jakub Kicinski's avatar
      rtnetlink: make the "split" NLM_DONE handling generic · 5b4b62a1
      Jakub Kicinski authored
      Jaroslav reports Dell's OMSA Systems Management Data Engine
      expects NLM_DONE in a separate recvmsg(), both for rtnl_dump_ifinfo()
      and inet_dump_ifaddr(). We already added a similar fix previously in
      commit 460b0d33 ("inet: bring NLM_DONE out to a separate recv() again")
      
      Instead of modifying all the dump handlers, and making them look
      different than modern for_each_netdev_dump()-based dump handlers -
      put the workaround in rtnetlink code. This will also help us move
      the custom rtnl-locking from af_netlink in the future (in net-next).
      
      Note that this change is not touching rtnl_dump_all(). rtnl_dump_all()
      is different kettle of fish and a potential problem. We now mix families
      in a single recvmsg(), but NLM_DONE is not coalesced.
      
      Tested:
      
        ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_addr.yaml \
                 --dump getaddr --json '{"ifa-family": 2}'
      
        ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \
                 --dump getroute --json '{"rtm-family": 2}'
      
        ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_link.yaml \
                 --dump getlink
      
      Fixes: 3e41af90 ("rtnetlink: use xarray iterator to implement rtnl_dump_ifinfo()")
      Fixes: cdb2f80f ("inet: use xa_array iterator to implement inet_dump_ifaddr()")
      Reported-by: default avatarJaroslav Pulchart <jaroslav.pulchart@gooddata.com>
      Link: https://lore.kernel.org/all/CAK8fFZ7MKoFSEzMBDAOjoUt+vTZRRQgLDNXEOfdCCXSoXXKE0g@mail.gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5b4b62a1
    • David S. Miller's avatar
      Merge branch 'tcp-mptcp-close-wait' · e137596e
      David S. Miller authored
      Jason Xing says:
      
      ====================
      tcp/mptcp: count CLOSE-WAIT for CurrEstab
      
      Taking CLOSE-WAIT sockets into CurrEstab counters is in accordance with RFC
      1213, as suggested by Eric and Neal.
      
      v5
      Link: https://lore.kernel.org/all/20240531091753.75930-1-kerneljasonxing@gmail.com/
      1. add more detailed comment (Matthieu)
      
      v4
      Link: https://lore.kernel.org/all/20240530131308.59737-1-kerneljasonxing@gmail.com/
      1. correct the Fixes: tag in patch [2/2]. (Eric)
      
      Previous discussion
      Link: https://lore.kernel.org/all/20240529033104.33882-1-kerneljasonxing@gmail.com/
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e137596e
    • Jason Xing's avatar
      mptcp: count CLOSE-WAIT sockets for MPTCP_MIB_CURRESTAB · 9633e937
      Jason Xing authored
      Like previous patch does in TCP, we need to adhere to RFC 1213:
      
        "tcpCurrEstab OBJECT-TYPE
         ...
         The number of TCP connections for which the current state
         is either ESTABLISHED or CLOSE- WAIT."
      
      So let's consider CLOSE-WAIT sockets.
      
      The logic of counting
      When we increment the counter?
      a) Only if we change the state to ESTABLISHED.
      
      When we decrement the counter?
      a) if the socket leaves ESTABLISHED and will never go into CLOSE-WAIT,
      say, on the client side, changing from ESTABLISHED to FIN-WAIT-1.
      b) if the socket leaves CLOSE-WAIT, say, on the server side, changing
      from CLOSE-WAIT to LAST-ACK.
      
      Fixes: d9cd27b8 ("mptcp: add CurrEstab MIB counter support")
      Signed-off-by: default avatarJason Xing <kernelxing@tencent.com>
      Reviewed-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9633e937
    • Jason Xing's avatar
      tcp: count CLOSE-WAIT sockets for TCP_MIB_CURRESTAB · a46d0ea5
      Jason Xing authored
      According to RFC 1213, we should also take CLOSE-WAIT sockets into
      consideration:
      
        "tcpCurrEstab OBJECT-TYPE
         ...
         The number of TCP connections for which the current state
         is either ESTABLISHED or CLOSE- WAIT."
      
      After this, CurrEstab counter will display the total number of
      ESTABLISHED and CLOSE-WAIT sockets.
      
      The logic of counting
      When we increment the counter?
      a) if we change the state to ESTABLISHED.
      b) if we change the state from SYN-RECEIVED to CLOSE-WAIT.
      
      When we decrement the counter?
      a) if the socket leaves ESTABLISHED and will never go into CLOSE-WAIT,
      say, on the client side, changing from ESTABLISHED to FIN-WAIT-1.
      b) if the socket leaves CLOSE-WAIT, say, on the server side, changing
      from CLOSE-WAIT to LAST-ACK.
      
      Please note: there are two chances that old state of socket can be changed
      to CLOSE-WAIT in tcp_fin(). One is SYN-RECV, the other is ESTABLISHED.
      So we have to take care of the former case.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarJason Xing <kernelxing@tencent.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a46d0ea5
    • Hangbin Liu's avatar
      selftests: hsr: add missing config for CONFIG_BRIDGE · 712115a2
      Hangbin Liu authored
      hsr_redbox.sh test need to create bridge for testing. Add the missing
      config CONFIG_BRIDGE in config file.
      
      Fixes: eafbf057 ("test: hsr: Extend the hsr_redbox.sh to have more SAN devices connected")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Tested-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      712115a2
    • Daniel Borkmann's avatar
      vxlan: Fix regression when dropping packets due to invalid src addresses · 1cd4bc98
      Daniel Borkmann authored
      Commit f58f45c1 ("vxlan: drop packets from invalid src-address")
      has recently been added to vxlan mainly in the context of source
      address snooping/learning so that when it is enabled, an entry in the
      FDB is not being created for an invalid address for the corresponding
      tunnel endpoint.
      
      Before commit f58f45c1 vxlan was similarly behaving as geneve in
      that it passed through whichever macs were set in the L2 header. It
      turns out that this change in behavior breaks setups, for example,
      Cilium with netkit in L3 mode for Pods as well as tunnel mode has been
      passing before the change in f58f45c1 for both vxlan and geneve.
      After mentioned change it is only passing for geneve as in case of
      vxlan packets are dropped due to vxlan_set_mac() returning false as
      source and destination macs are zero which for E/W traffic via tunnel
      is totally fine.
      
      Fix it by only opting into the is_valid_ether_addr() check in
      vxlan_set_mac() when in fact source address snooping/learning is
      actually enabled in vxlan. This is done by moving the check into
      vxlan_snoop(). With this change, the Cilium connectivity test suite
      passes again for both tunnel flavors.
      
      Fixes: f58f45c1 ("vxlan: drop packets from invalid src-address")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: David Bauer <mail@david-bauer.net>
      Cc: Ido Schimmel <idosch@nvidia.com>
      Cc: Nikolay Aleksandrov <razor@blackwall.org>
      Cc: Martin KaFai Lau <martin.lau@kernel.org>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Reviewed-by: default avatarDavid Bauer <mail@david-bauer.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1cd4bc98
    • Hangyu Hua's avatar
      net: sched: sch_multiq: fix possible OOB write in multiq_tune() · affc18fd
      Hangyu Hua authored
      q->bands will be assigned to qopt->bands to execute subsequent code logic
      after kmalloc. So the old q->bands should not be used in kmalloc.
      Otherwise, an out-of-bounds write will occur.
      
      Fixes: c2999f7f ("net: sched: multiq: don't call qdisc_put() while holding tree lock")
      Signed-off-by: default avatarHangyu Hua <hbh25y@gmail.com>
      Acked-by: default avatarCong Wang <cong.wang@bytedance.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      affc18fd
    • Taehee Yoo's avatar
      ionic: fix kernel panic in XDP_TX action · 491aee89
      Taehee Yoo authored
      In the XDP_TX path, ionic driver sends a packet to the TX path with rx
      page and corresponding dma address.
      After tx is done, ionic_tx_clean() frees that page.
      But RX ring buffer isn't reset to NULL.
      So, it uses a freed page, which causes kernel panic.
      
      BUG: unable to handle page fault for address: ffff8881576c110c
      PGD 773801067 P4D 773801067 PUD 87f086067 PMD 87efca067 PTE 800ffffea893e060
      Oops: Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN NOPTI
      CPU: 1 PID: 25 Comm: ksoftirqd/1 Not tainted 6.9.0+ #11
      Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021
      RIP: 0010:bpf_prog_f0b8caeac1068a55_balancer_ingress+0x3b/0x44f
      Code: 00 53 41 55 41 56 41 57 b8 01 00 00 00 48 8b 5f 08 4c 8b 77 00 4c 89 f7 48 83 c7 0e 48 39 d8
      RSP: 0018:ffff888104e6fa28 EFLAGS: 00010283
      RAX: 0000000000000002 RBX: ffff8881576c1140 RCX: 0000000000000002
      RDX: ffffffffc0051f64 RSI: ffffc90002d33048 RDI: ffff8881576c110e
      RBP: ffff888104e6fa88 R08: 0000000000000000 R09: ffffed1027a04a23
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff8881b03a21a8
      R13: ffff8881589f800f R14: ffff8881576c1100 R15: 00000001576c1100
      FS: 0000000000000000(0000) GS:ffff88881ae00000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffff8881576c110c CR3: 0000000767a90000 CR4: 00000000007506f0
      PKRU: 55555554
      Call Trace:
      <TASK>
      ? __die+0x20/0x70
      ? page_fault_oops+0x254/0x790
      ? __pfx_page_fault_oops+0x10/0x10
      ? __pfx_is_prefetch.constprop.0+0x10/0x10
      ? search_bpf_extables+0x165/0x260
      ? fixup_exception+0x4a/0x970
      ? exc_page_fault+0xcb/0xe0
      ? asm_exc_page_fault+0x22/0x30
      ? 0xffffffffc0051f64
      ? bpf_prog_f0b8caeac1068a55_balancer_ingress+0x3b/0x44f
      ? do_raw_spin_unlock+0x54/0x220
      ionic_rx_service+0x11ab/0x3010 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864]
      ? ionic_tx_clean+0x29b/0xc60 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864]
      ? __pfx_ionic_tx_clean+0x10/0x10 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864]
      ? __pfx_ionic_rx_service+0x10/0x10 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864]
      ? ionic_tx_cq_service+0x25d/0xa00 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864]
      ? __pfx_ionic_rx_service+0x10/0x10 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864]
      ionic_cq_service+0x69/0x150 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864]
      ionic_txrx_napi+0x11a/0x540 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864]
      __napi_poll.constprop.0+0xa0/0x440
      net_rx_action+0x7e7/0xc30
      ? __pfx_net_rx_action+0x10/0x10
      
      Fixes: 8eeed837 ("ionic: Add XDP_TX support")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Reviewed-by: default avatarBrett Creeley <brett.creeley@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      491aee89