1. 15 Dec, 2023 2 commits
  2. 14 Dec, 2023 12 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · c7402612
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
      "Current release - regressions:
      
         - tcp: fix tcp_disordered_ack() vs usec TS resolution
      
        Current release - new code bugs:
      
         - dpll: sanitize possible null pointer dereference in
           dpll_pin_parent_pin_set()
      
         - eth: octeon_ep: initialise control mbox tasks before using APIs
      
        Previous releases - regressions:
      
         - io_uring/af_unix: disable sending io_uring over sockets
      
         - eth: mlx5e:
             - TC, don't offload post action rule if not supported
             - fix possible deadlock on mlx5e_tx_timeout_work
      
         - eth: iavf: fix iavf_shutdown to call iavf_remove instead iavf_close
      
         - eth: bnxt_en: fix skb recycling logic in bnxt_deliver_skb()
      
         - eth: ena: fix DMA syncing in XDP path when SWIOTLB is on
      
         - eth: team: fix use-after-free when an option instance allocation
           fails
      
        Previous releases - always broken:
      
         - neighbour: don't let neigh_forced_gc() disable preemption for long
      
         - net: prevent mss overflow in skb_segment()
      
         - ipv6: support reporting otherwise unknown prefix flags in
           RTM_NEWPREFIX
      
         - tcp: remove acked SYN flag from packet in the transmit queue
           correctly
      
         - eth: octeontx2-af:
             - fix a use-after-free in rvu_nix_register_reporters
             - fix promisc mcam entry action
      
         - eth: dwmac-loongson: make sure MDIO is initialized before use
      
         - eth: atlantic: fix double free in ring reinit logic"
      
      * tag 'net-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
        net: atlantic: fix double free in ring reinit logic
        appletalk: Fix Use-After-Free in atalk_ioctl
        net: stmmac: Handle disabled MDIO busses from devicetree
        net: stmmac: dwmac-qcom-ethqos: Fix drops in 10M SGMII RX
        dpaa2-switch: do not ask for MDB, VLAN and FDB replay
        dpaa2-switch: fix size of the dma_unmap
        net: prevent mss overflow in skb_segment()
        vsock/virtio: Fix unsigned integer wrap around in virtio_transport_has_space()
        Revert "tcp: disable tcp_autocorking for socket when TCP_NODELAY flag is set"
        MIPS: dts: loongson: drop incorrect dwmac fallback compatible
        stmmac: dwmac-loongson: drop useless check for compatible fallback
        stmmac: dwmac-loongson: Make sure MDIO is initialized before use
        tcp: disable tcp_autocorking for socket when TCP_NODELAY flag is set
        dpll: sanitize possible null pointer dereference in dpll_pin_parent_pin_set()
        net: ena: Fix XDP redirection error
        net: ena: Fix DMA syncing in XDP path when SWIOTLB is on
        net: ena: Fix xdp drops handling due to multibuf packets
        net: ena: Destroy correct number of xdp queues upon failure
        net: Remove acked SYN flag from packet in the transmit queue correctly
        qed: Fix a potential use-after-free in qed_cxt_tables_alloc
        ...
      c7402612
    • Linus Torvalds's avatar
      Merge tag 'for-6.7-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · bdb2701f
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
        "Some fixes to quota accounting code, mostly around error handling and
         correctness:
      
         - free reserves on various error paths, after IO errors or
           transaction abort
      
         - don't clear reserved range at the folio release time, it'll be
           properly cleared after final write
      
         - fix integer overflow due to int used when passing around size of
           freed reservations
      
         - fix a regression in squota accounting that missed some cases with
           delayed refs"
      
      * tag 'for-6.7-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: ensure releasing squota reserve on head refs
        btrfs: don't clear qgroup reserved bit in release_folio
        btrfs: free qgroup pertrans reserve on transaction abort
        btrfs: fix qgroup_free_reserved_data int overflow
        btrfs: free qgroup reserve when ORDERED_IOERR is set
      bdb2701f
    • Igor Russkikh's avatar
      net: atlantic: fix double free in ring reinit logic · 7bb26ea7
      Igor Russkikh authored
      Driver has a logic leak in ring data allocation/free,
      where double free may happen in aq_ring_free if system is under
      stress and driver init/deinit is happening.
      
      The probability is higher to get this during suspend/resume cycle.
      
      Verification was done simulating same conditions with
      
          stress -m 2000 --vm-bytes 20M --vm-hang 10 --backoff 1000
          while true; do sudo ifconfig enp1s0 down; sudo ifconfig enp1s0 up; done
      
      Fixed by explicitly clearing pointers to NULL on deallocation
      
      Fixes: 018423e9 ("net: ethernet: aquantia: Add ring support code")
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Closes: https://lore.kernel.org/netdev/CAHk-=wiZZi7FcvqVSUirHBjx0bBUZ4dFrMDVLc3+3HCrtq0rBA@mail.gmail.com/Signed-off-by: default avatarIgor Russkikh <irusskikh@marvell.com>
      Link: https://lore.kernel.org/r/20231213094044.22988-1-irusskikh@marvell.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      7bb26ea7
    • Hyunwoo Kim's avatar
      appletalk: Fix Use-After-Free in atalk_ioctl · 189ff167
      Hyunwoo Kim authored
      Because atalk_ioctl() accesses sk->sk_receive_queue
      without holding a sk->sk_receive_queue.lock, it can
      cause a race with atalk_recvmsg().
      A use-after-free for skb occurs with the following flow.
      ```
      atalk_ioctl() -> skb_peek()
      atalk_recvmsg() -> skb_recv_datagram() -> skb_free_datagram()
      ```
      Add sk->sk_receive_queue.lock to atalk_ioctl() to fix this issue.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarHyunwoo Kim <v4bel@theori.io>
      Link: https://lore.kernel.org/r/20231213041056.GA519680@v4bel-B760M-AORUS-ELITE-AXSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      189ff167
    • Andrew Halaney's avatar
      net: stmmac: Handle disabled MDIO busses from devicetree · e23c0d21
      Andrew Halaney authored
      Many hardware configurations have the MDIO bus disabled, and are instead
      using some other MDIO bus to talk to the MAC's phy.
      
      of_mdiobus_register() returns -ENODEV in this case. Let's handle it
      gracefully instead of failing to probe the MAC.
      
      Fixes: 47dd7a54 ("net: add support for STMicroelectronics Ethernet controllers.")
      Signed-off-by: default avatarAndrew Halaney <ahalaney@redhat.com>
      Reviewed-by: default avatarSerge Semin <fancer.lancer@gmail.com>
      Link: https://lore.kernel.org/r/20231212-b4-stmmac-handle-mdio-enodev-v2-1-600171acf79f@redhat.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      e23c0d21
    • Sneh Shah's avatar
      net: stmmac: dwmac-qcom-ethqos: Fix drops in 10M SGMII RX · 981d947b
      Sneh Shah authored
      In 10M SGMII mode all the packets are being dropped due to wrong Rx clock.
      SGMII 10MBPS mode needs RX clock divider programmed to avoid drops in Rx.
      Update configure SGMII function with Rx clk divider programming.
      
      Fixes: 463120c3 ("net: stmmac: dwmac-qcom-ethqos: add support for SGMII")
      Tested-by: default avatarAndrew Halaney <ahalaney@redhat.com>
      Signed-off-by: default avatarSneh Shah <quic_snehshah@quicinc.com>
      Reviewed-by: default avatarBjorn Andersson <quic_bjorande@quicinc.com>
      Link: https://lore.kernel.org/r/20231212092208.22393-1-quic_snehshah@quicinc.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      981d947b
    • Jakub Kicinski's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 89e0c646
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2023-12-12 (iavf)
      
      This series contains updates to iavf driver only.
      
      Piotr reworks Flow Director states to deal with issues in restoring
      filters.
      
      Slawomir fixes shutdown processing as it was missing needed calls.
      
      * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        iavf: Fix iavf_shutdown to call iavf_remove instead iavf_close
        iavf: Handle ntuple on/off based on new state machines for flow director
        iavf: Introduce new state machines for flow director
      ====================
      
      Link: https://lore.kernel.org/r/20231212203613.513423-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      89e0c646
    • Jakub Kicinski's avatar
      Merge branch 'dpaa2-switch-various-fixes' · dc84bb19
      Jakub Kicinski authored
      Ioana Ciornei says:
      
      ====================
      dpaa2-switch: various fixes
      
      The first patch fixes the size passed to two dma_unmap_single() calls
      which was wrongly put as the size of the pointer.
      
      The second patch is new to this series and reverts the behavior of the
      dpaa2-switch driver to not ask for object replay upon offloading so that
      we avoid the errors encountered when a VLAN is installed multiple times
      on the same port.
      ====================
      
      Link: https://lore.kernel.org/r/20231212164326.2753457-1-ioana.ciornei@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      dc84bb19
    • Ioana Ciornei's avatar
      dpaa2-switch: do not ask for MDB, VLAN and FDB replay · f24a49a3
      Ioana Ciornei authored
      Starting with commit 4e51bf44 ("net: bridge: move the switchdev
      object replay helpers to "push" mode") the switchdev_bridge_port_offload()
      helper was extended with the intention to provide switchdev drivers easy
      access to object addition and deletion replays. This works by calling
      the replay helpers with non-NULL notifier blocks.
      
      In the same commit, the dpaa2-switch driver was updated so that it
      passes valid notifier blocks to the helper. At that moment, no
      regression was identified through testing.
      
      In the meantime, the blamed commit changed the behavior in terms of
      which ports get hit by the replay. Before this commit, only the initial
      port which identified itself as offloaded through
      switchdev_bridge_port_offload() got a replay of all port objects and
      FDBs. After this, the newly joining port will trigger a replay of
      objects on all bridge ports and on the bridge itself.
      
      This behavior leads to errors in dpaa2_switch_port_vlans_add() when a
      VLAN gets installed on the same interface multiple times.
      
      The intended mechanism to address this is to pass a non-NULL ctx to the
      switchdev_bridge_port_offload() helper and then check it against the
      port's private structure. But since the driver does not have any use for
      the replayed port objects and FDBs until it gains support for LAG
      offload, it's better to fix the issue by reverting the dpaa2-switch
      driver to not ask for replay. The pointers will be added back when we
      are prepared to ignore replays on unrelated ports.
      
      Fixes: b28d580e ("net: bridge: switchdev: replay all VLAN groups")
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Link: https://lore.kernel.org/r/20231212164326.2753457-3-ioana.ciornei@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f24a49a3
    • Ioana Ciornei's avatar
      dpaa2-switch: fix size of the dma_unmap · 2aad7d41
      Ioana Ciornei authored
      The size of the DMA unmap was wrongly put as a sizeof of a pointer.
      Change the value of the DMA unmap to be the actual macro used for the
      allocation and the DMA map.
      
      Fixes: 1110318d ("dpaa2-switch: add tc flower hardware offload on ingress traffic")
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Link: https://lore.kernel.org/r/20231212164326.2753457-2-ioana.ciornei@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2aad7d41
    • Eric Dumazet's avatar
      net: prevent mss overflow in skb_segment() · 23d05d56
      Eric Dumazet authored
      Once again syzbot is able to crash the kernel in skb_segment() [1]
      
      GSO_BY_FRAGS is a forbidden value, but unfortunately the following
      computation in skb_segment() can reach it quite easily :
      
      	mss = mss * partial_segs;
      
      65535 = 3 * 5 * 17 * 257, so many initial values of mss can lead to
      a bad final result.
      
      Make sure to limit segmentation so that the new mss value is smaller
      than GSO_BY_FRAGS.
      
      [1]
      
      general protection fault, probably for non-canonical address 0xdffffc000000000e: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x0000000000000070-0x0000000000000077]
      CPU: 1 PID: 5079 Comm: syz-executor993 Not tainted 6.7.0-rc4-syzkaller-00141-g1ae4cd3c #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023
      RIP: 0010:skb_segment+0x181d/0x3f30 net/core/skbuff.c:4551
      Code: 83 e3 02 e9 fb ed ff ff e8 90 68 1c f9 48 8b 84 24 f8 00 00 00 48 8d 78 70 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 8a 21 00 00 48 8b 84 24 f8 00
      RSP: 0018:ffffc900043473d0 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: 0000000000010046 RCX: ffffffff886b1597
      RDX: 000000000000000e RSI: ffffffff886b2520 RDI: 0000000000000070
      RBP: ffffc90004347578 R08: 0000000000000005 R09: 000000000000ffff
      R10: 000000000000ffff R11: 0000000000000002 R12: ffff888063202ac0
      R13: 0000000000010000 R14: 000000000000ffff R15: 0000000000000046
      FS: 0000555556e7e380(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020010000 CR3: 0000000027ee2000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
      <TASK>
      udp6_ufo_fragment+0xa0e/0xd00 net/ipv6/udp_offload.c:109
      ipv6_gso_segment+0x534/0x17e0 net/ipv6/ip6_offload.c:120
      skb_mac_gso_segment+0x290/0x610 net/core/gso.c:53
      __skb_gso_segment+0x339/0x710 net/core/gso.c:124
      skb_gso_segment include/net/gso.h:83 [inline]
      validate_xmit_skb+0x36c/0xeb0 net/core/dev.c:3626
      __dev_queue_xmit+0x6f3/0x3d60 net/core/dev.c:4338
      dev_queue_xmit include/linux/netdevice.h:3134 [inline]
      packet_xmit+0x257/0x380 net/packet/af_packet.c:276
      packet_snd net/packet/af_packet.c:3087 [inline]
      packet_sendmsg+0x24c6/0x5220 net/packet/af_packet.c:3119
      sock_sendmsg_nosec net/socket.c:730 [inline]
      __sock_sendmsg+0xd5/0x180 net/socket.c:745
      __sys_sendto+0x255/0x340 net/socket.c:2190
      __do_sys_sendto net/socket.c:2202 [inline]
      __se_sys_sendto net/socket.c:2198 [inline]
      __x64_sys_sendto+0xe0/0x1b0 net/socket.c:2198
      do_syscall_x64 arch/x86/entry/common.c:52 [inline]
      do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83
      entry_SYSCALL_64_after_hwframe+0x63/0x6b
      RIP: 0033:0x7f8692032aa9
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 d1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fff8d685418 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f8692032aa9
      RDX: 0000000000010048 RSI: 00000000200000c0 RDI: 0000000000000003
      RBP: 00000000000f4240 R08: 0000000020000540 R09: 0000000000000014
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff8d685480
      R13: 0000000000000001 R14: 00007fff8d685480 R15: 0000000000000003
      </TASK>
      Modules linked in:
      ---[ end trace 0000000000000000 ]---
      RIP: 0010:skb_segment+0x181d/0x3f30 net/core/skbuff.c:4551
      Code: 83 e3 02 e9 fb ed ff ff e8 90 68 1c f9 48 8b 84 24 f8 00 00 00 48 8d 78 70 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 8a 21 00 00 48 8b 84 24 f8 00
      RSP: 0018:ffffc900043473d0 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: 0000000000010046 RCX: ffffffff886b1597
      RDX: 000000000000000e RSI: ffffffff886b2520 RDI: 0000000000000070
      RBP: ffffc90004347578 R08: 0000000000000005 R09: 000000000000ffff
      R10: 000000000000ffff R11: 0000000000000002 R12: ffff888063202ac0
      R13: 0000000000010000 R14: 000000000000ffff R15: 0000000000000046
      FS: 0000555556e7e380(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020010000 CR3: 0000000027ee2000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Fixes: 3953c46c ("sk_buff: allow segmenting based on frag sizes")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/r/20231212164621.4131800-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      23d05d56
    • Nikolay Kuratov's avatar
      vsock/virtio: Fix unsigned integer wrap around in virtio_transport_has_space() · 60316d7f
      Nikolay Kuratov authored
      We need to do signed arithmetic if we expect condition
      `if (bytes < 0)` to be possible
      
      Found by Linux Verification Center (linuxtesting.org) with SVACE
      
      Fixes: 06a8fc78 ("VSOCK: Introduce virtio_vsock_common.ko")
      Signed-off-by: default avatarNikolay Kuratov <kniv@yandex-team.ru>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Link: https://lore.kernel.org/r/20231211162317.4116625-1-kniv@yandex-team.ruSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      60316d7f
  3. 13 Dec, 2023 17 commits
  4. 12 Dec, 2023 9 commits
    • Dong Chenchen's avatar
      net: Remove acked SYN flag from packet in the transmit queue correctly · f99cd562
      Dong Chenchen authored
      syzkaller report:
      
       kernel BUG at net/core/skbuff.c:3452!
       invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
       CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.7.0-rc4-00009-gbee0e776-dirty #135
       RIP: 0010:skb_copy_and_csum_bits (net/core/skbuff.c:3452)
       Call Trace:
       icmp_glue_bits (net/ipv4/icmp.c:357)
       __ip_append_data.isra.0 (net/ipv4/ip_output.c:1165)
       ip_append_data (net/ipv4/ip_output.c:1362 net/ipv4/ip_output.c:1341)
       icmp_push_reply (net/ipv4/icmp.c:370)
       __icmp_send (./include/net/route.h:252 net/ipv4/icmp.c:772)
       ip_fragment.constprop.0 (./include/linux/skbuff.h:1234 net/ipv4/ip_output.c:592 net/ipv4/ip_output.c:577)
       __ip_finish_output (net/ipv4/ip_output.c:311 net/ipv4/ip_output.c:295)
       ip_output (net/ipv4/ip_output.c:427)
       __ip_queue_xmit (net/ipv4/ip_output.c:535)
       __tcp_transmit_skb (net/ipv4/tcp_output.c:1462)
       __tcp_retransmit_skb (net/ipv4/tcp_output.c:3387)
       tcp_retransmit_skb (net/ipv4/tcp_output.c:3404)
       tcp_retransmit_timer (net/ipv4/tcp_timer.c:604)
       tcp_write_timer (./include/linux/spinlock.h:391 net/ipv4/tcp_timer.c:716)
      
      The panic issue was trigered by tcp simultaneous initiation.
      The initiation process is as follows:
      
            TCP A                                            TCP B
      
        1.  CLOSED                                           CLOSED
      
        2.  SYN-SENT     --> <SEQ=100><CTL=SYN>              ...
      
        3.  SYN-RECEIVED <-- <SEQ=300><CTL=SYN>              <-- SYN-SENT
      
        4.               ... <SEQ=100><CTL=SYN>              --> SYN-RECEIVED
      
        5.  SYN-RECEIVED --> <SEQ=100><ACK=301><CTL=SYN,ACK> ...
      
        // TCP B: not send challenge ack for ack limit or packet loss
        // TCP A: close
      	tcp_close
      	   tcp_send_fin
                    if (!tskb && tcp_under_memory_pressure(sk))
                        tskb = skb_rb_last(&sk->tcp_rtx_queue); //pick SYN_ACK packet
                 TCP_SKB_CB(tskb)->tcp_flags |= TCPHDR_FIN;  // set FIN flag
      
        6.  FIN_WAIT_1  --> <SEQ=100><ACK=301><END_SEQ=102><CTL=SYN,FIN,ACK> ...
      
        // TCP B: send challenge ack to SYN_FIN_ACK
      
        7.               ... <SEQ=301><ACK=101><CTL=ACK>   <-- SYN-RECEIVED //challenge ack
      
        // TCP A:  <SND.UNA=101>
      
        8.  FIN_WAIT_1 --> <SEQ=101><ACK=301><END_SEQ=102><CTL=SYN,FIN,ACK> ... // retransmit panic
      
      	__tcp_retransmit_skb  //skb->len=0
      	    tcp_trim_head
      		len = tp->snd_una - TCP_SKB_CB(skb)->seq // len=101-100
      		    __pskb_trim_head
      			skb->data_len -= len // skb->len=-1, wrap around
      	    ... ...
      	    ip_fragment
      		icmp_glue_bits //BUG_ON
      
      If we use tcp_trim_head() to remove acked SYN from packet that contains data
      or other flags, skb->len will be incorrectly decremented. We can remove SYN
      flag that has been acked from rtx_queue earlier than tcp_trim_head(), which
      can fix the problem mentioned above.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Co-developed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDong Chenchen <dongchenchen2@huawei.com>
      Link: https://lore.kernel.org/r/20231210020200.1539875-1-dongchenchen2@huawei.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f99cd562
    • Dinghao Liu's avatar
      qed: Fix a potential use-after-free in qed_cxt_tables_alloc · b65d52ac
      Dinghao Liu authored
      qed_ilt_shadow_alloc() will call qed_ilt_shadow_free() to
      free p_hwfn->p_cxt_mngr->ilt_shadow on error. However,
      qed_cxt_tables_alloc() accesses the freed pointer on failure
      of qed_ilt_shadow_alloc() through calling qed_cxt_mngr_free(),
      which may lead to use-after-free. Fix this issue by setting
      p_mngr->ilt_shadow to NULL in qed_ilt_shadow_free().
      
      Fixes: fe56b9e6 ("qed: Add module with basic common support")
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Signed-off-by: default avatarDinghao Liu <dinghao.liu@zju.edu.cn>
      Link: https://lore.kernel.org/r/20231210045255.21383-1-dinghao.liu@zju.edu.cnSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b65d52ac
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · cf52eed7
      Linus Torvalds authored
      Pull ext4 fixes from Ted Ts'o:
       "Fix various bugs / regressions for ext4, including a soft lockup, a
        WARN_ON, and a BUG"
      
      * tag 'ext4_for_linus-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        jbd2: fix soft lockup in journal_finish_inode_data_buffers()
        ext4: fix warning in ext4_dio_write_end_io()
        jbd2: increase the journal IO's priority
        jbd2: correct the printing of write_flags in jbd2_write_superblock()
        ext4: prevent the normalized size from exceeding EXT_MAX_BLOCKS
      cf52eed7
    • Slawomir Laba's avatar
      iavf: Fix iavf_shutdown to call iavf_remove instead iavf_close · 7ae42ef3
      Slawomir Laba authored
      Make the flow for pci shutdown be the same to the pci remove.
      
      iavf_shutdown was implementing an incomplete version
      of iavf_remove. It misses several calls to the kernel like
      iavf_free_misc_irq, iavf_reset_interrupt_capability, iounmap
      that might break the system on reboot or hibernation.
      
      Implement the call of iavf_remove directly in iavf_shutdown to
      close this gap.
      
      Fixes below error messages (dmesg) during shutdown stress tests -
      [685814.900917] ice 0000:88:00.0: MAC 02:d0:5f:82:43:5d does not exist for
       VF 0
      [685814.900928] ice 0000:88:00.0: MAC 33:33:00:00:00:01 does not exist for
      VF 0
      
      Reproduction:
      
      1. Create one VF interface:
      echo 1 > /sys/class/net/<interface_name>/device/sriov_numvfs
      
      2. Run live dmesg on the host:
      dmesg -wH
      
      3. On SUT, script below steps into vf_namespace_assignment.sh
      
      <#!/bin/sh> // Remove <>. Git removes # line
      if=<VF name> (edit this per VF name)
      loop=0
      
      while true; do
      
      echo test round $loop
      let loop++
      
      ip netns add ns$loop
      ip link set dev $if up
      ip link set dev $if netns ns$loop
      ip netns exec ns$loop ip link set dev $if up
      ip netns exec ns$loop ip link set dev $if netns 1
      ip netns delete ns$loop
      
      done
      
      4. Run the script for at least 1000 iterations on SUT:
      ./vf_namespace_assignment.sh
      
      Expected result:
      No errors in dmesg.
      
      Fixes: 129cf89e ("iavf: rename functions and structs to new name")
      Signed-off-by: default avatarSlawomir Laba <slawomirx.laba@intel.com>
      Reviewed-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Reviewed-by: default avatarAhmed Zaki <ahmed.zaki@intel.com>
      Reviewed-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Co-developed-by: default avatarRanganatha Rao <ranganatha.rao@intel.com>
      Signed-off-by: default avatarRanganatha Rao <ranganatha.rao@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      7ae42ef3
    • Piotr Gardocki's avatar
      iavf: Handle ntuple on/off based on new state machines for flow director · 09d23b89
      Piotr Gardocki authored
      ntuple-filter feature on/off:
      Default is on. If turned off, the filters will be removed from both
      PF and iavf list. The removal is irrespective of current filter state.
      
      Steps to reproduce:
      -------------------
      
      1. Ensure ntuple is on.
      
      ethtool -K enp8s0 ntuple-filters on
      
      2. Create a filter to receive the traffic into non-default rx-queue like 15
      and ensure traffic is flowing into queue into 15.
      Now, turn off ntuple. Traffic should not flow to configured queue 15.
      It should flow to default RX queue.
      
      Fixes: 0dbfbabb ("iavf: Add framework to enable ethtool ntuple filters")
      Signed-off-by: default avatarPiotr Gardocki <piotrx.gardocki@intel.com>
      Reviewed-by: default avatarLarysa Zaremba <larysa.zaremba@intel.com>
      Signed-off-by: default avatarRanganatha Rao <ranganatha.rao@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      09d23b89
    • Piotr Gardocki's avatar
      iavf: Introduce new state machines for flow director · 3a0b5a29
      Piotr Gardocki authored
      New states introduced:
      
       IAVF_FDIR_FLTR_DIS_REQUEST
       IAVF_FDIR_FLTR_DIS_PENDING
       IAVF_FDIR_FLTR_INACTIVE
      
      Current FDIR state machines (SM) are not adequate to handle a few
      scenarios in the link DOWN/UP event, reset event and ntuple-feature.
      
      For example, when VF link goes DOWN and comes back UP administratively,
      the expectation is that previously installed filters should also be
      restored. But with current SM, filters are not restored.
      So with new SM, during link DOWN filters are marked as INACTIVE in
      the iavf list but removed from PF. After link UP, SM will transition
      from INACTIVE to ADD_REQUEST to restore the filter.
      
      Similarly, with VF reset, filters will be removed from the PF, but
      marked as INACTIVE in the iavf list. Filters will be restored after
      reset completion.
      
      Steps to reproduce:
      -------------------
      
      1. Create a VF. Here VF is enp8s0.
      
      2. Assign IP addresses to VF and link partner and ping continuously
      from remote. Here remote IP is 1.1.1.1.
      
      3. Check default RX Queue of traffic.
      
      ethtool -S enp8s0 | grep -E "rx-[[:digit:]]+\.packets"
      
      4. Add filter - change default RX Queue (to 15 here)
      
      ethtool -U ens8s0 flow-type ip4 src-ip 1.1.1.1 action 15 loc 5
      
      5. Ensure filter gets added and traffic is received on RX queue 15 now.
      
      Link event testing:
      -------------------
      6. Bring VF link down and up. If traffic flows to configured queue 15,
      test is success, otherwise it is a failure.
      
      Reset event testing:
      --------------------
      7. Reset the VF. If traffic flows to configured queue 15, test is success,
      otherwise it is a failure.
      
      Fixes: 0dbfbabb ("iavf: Add framework to enable ethtool ntuple filters")
      Signed-off-by: default avatarPiotr Gardocki <piotrx.gardocki@intel.com>
      Reviewed-by: default avatarLarysa Zaremba <larysa.zaremba@intel.com>
      Signed-off-by: default avatarRanganatha Rao <ranganatha.rao@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      3a0b5a29
    • Linus Torvalds's avatar
      Merge tag 'fuse-fixes-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse · eaadbbaa
      Linus Torvalds authored
      Pull fuse fixes from Miklos Szeredi:
      
       - Fix a couple of potential crashes, one introduced in 6.6 and one
         in 5.10
      
       - Fix misbehavior of virtiofs submounts on memory pressure
      
       - Clarify naming in the uAPI for a recent feature
      
      * tag 'fuse-fixes-6.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
        fuse: disable FOPEN_PARALLEL_DIRECT_WRITES with FUSE_DIRECT_IO_ALLOW_MMAP
        fuse: dax: set fc->dax to NULL in fuse_dax_conn_free()
        fuse: share lookup state between submount and its parent
        docs/fuse-io: Document the usage of DIRECT_IO_ALLOW_MMAP
        fuse: Rename DIRECT_IO_RELAX to DIRECT_IO_ALLOW_MMAP
      eaadbbaa
    • Linus Torvalds's avatar
      Merge tag '6.7-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd · 8b8cd4be
      Linus Torvalds authored
      Pull smb server fixes from Steve French:
      
       - Memory leak fix (in lock error path)
      
       - Two fixes for create with allocation size
      
       - FIx for potential UAF in lease break error path
      
       - Five directory lease (caching) fixes found during additional recent
         testing
      
      * tag '6.7-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
        ksmbd: fix wrong name of SMB2_CREATE_ALLOCATION_SIZE
        ksmbd: fix wrong allocation size update in smb2_open()
        ksmbd: avoid duplicate opinfo_put() call on error of smb21_lease_break_ack()
        ksmbd: lazy v2 lease break on smb2_write()
        ksmbd: send v2 lease break notification for directory
        ksmbd: downgrade RWH lease caching state to RH for directory
        ksmbd: set v2 lease capability
        ksmbd: set epoch in create context v2 lease
        ksmbd: fix memory leak in smb2_lock()
      8b8cd4be
    • Ye Bin's avatar
      jbd2: fix soft lockup in journal_finish_inode_data_buffers() · 6c02757c
      Ye Bin authored
      There's issue when do io test:
      WARN: soft lockup - CPU#45 stuck for 11s! [jbd2/dm-2-8:4170]
      CPU: 45 PID: 4170 Comm: jbd2/dm-2-8 Kdump: loaded Tainted: G  OE
      Call trace:
       dump_backtrace+0x0/0x1a0
       show_stack+0x24/0x30
       dump_stack+0xb0/0x100
       watchdog_timer_fn+0x254/0x3f8
       __hrtimer_run_queues+0x11c/0x380
       hrtimer_interrupt+0xfc/0x2f8
       arch_timer_handler_phys+0x38/0x58
       handle_percpu_devid_irq+0x90/0x248
       generic_handle_irq+0x3c/0x58
       __handle_domain_irq+0x68/0xc0
       gic_handle_irq+0x90/0x320
       el1_irq+0xcc/0x180
       queued_spin_lock_slowpath+0x1d8/0x320
       jbd2_journal_commit_transaction+0x10f4/0x1c78 [jbd2]
       kjournald2+0xec/0x2f0 [jbd2]
       kthread+0x134/0x138
       ret_from_fork+0x10/0x18
      
      Analyzed informations from vmcore as follows:
      (1) There are about 5k+ jbd2_inode in 'commit_transaction->t_inode_list';
      (2) Now is processing the 855th jbd2_inode;
      (3) JBD2 task has TIF_NEED_RESCHED flag;
      (4) There's no pags in address_space around the 855th jbd2_inode;
      (5) There are some process is doing drop caches;
      (6) Mounted with 'nodioread_nolock' option;
      (7) 128 CPUs;
      
      According to informations from vmcore we know 'journal->j_list_lock' spin lock
      competition is fierce. So journal_finish_inode_data_buffers() maybe process
      slowly. Theoretically, there is scheduling point in the filemap_fdatawait_range_keep_errors().
      However, if inode's address_space has no pages which taged with PAGECACHE_TAG_WRITEBACK,
      will not call cond_resched(). So may lead to soft lockup.
      journal_finish_inode_data_buffers
        filemap_fdatawait_range_keep_errors
          __filemap_fdatawait_range
            while (index <= end)
              nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end, PAGECACHE_TAG_WRITEBACK);
              if (!nr_pages)
                 break;    --> If 'nr_pages' is equal zero will break, then will not call cond_resched()
              for (i = 0; i < nr_pages; i++)
                wait_on_page_writeback(page);
              cond_resched();
      
      To solve above issue, add scheduling point in the journal_finish_inode_data_buffers();
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20231211112544.3879780-1-yebin10@huawei.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      6c02757c