1. 23 Dec, 2021 7 commits
    • Amir Tzin's avatar
      net/mlx5e: Wrap the tx reporter dump callback to extract the sq · 918fc385
      Amir Tzin authored
      Function mlx5e_tx_reporter_dump_sq() casts its void * argument to struct
      mlx5e_txqsq *, but in TX-timeout-recovery flow the argument is actually
      of type struct mlx5e_tx_timeout_ctx *.
      
       mlx5_core 0000:08:00.1 enp8s0f1: TX timeout detected
       mlx5_core 0000:08:00.1 enp8s0f1: TX timeout on queue: 1, SQ: 0x11ec, CQ: 0x146d, SQ Cons: 0x0 SQ Prod: 0x1, usecs since last trans: 21565000
       BUG: stack guard page was hit at 0000000093f1a2de (stack is 00000000b66ea0dc..000000004d932dae)
       kernel stack overflow (page fault): 0000 [#1] SMP NOPTI
       CPU: 5 PID: 95 Comm: kworker/u20:1 Tainted: G W OE 5.13.0_mlnx #1
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       Workqueue: mlx5e mlx5e_tx_timeout_work [mlx5_core]
       RIP: 0010:mlx5e_tx_reporter_dump_sq+0xd3/0x180
       [mlx5_core]
       Call Trace:
       mlx5e_tx_reporter_dump+0x43/0x1c0 [mlx5_core]
       devlink_health_do_dump.part.91+0x71/0xd0
       devlink_health_report+0x157/0x1b0
       mlx5e_reporter_tx_timeout+0xb9/0xf0 [mlx5_core]
       ? mlx5e_tx_reporter_err_cqe_recover+0x1d0/0x1d0
       [mlx5_core]
       ? mlx5e_health_queue_dump+0xd0/0xd0 [mlx5_core]
       ? update_load_avg+0x19b/0x550
       ? set_next_entity+0x72/0x80
       ? pick_next_task_fair+0x227/0x340
       ? finish_task_switch+0xa2/0x280
         mlx5e_tx_timeout_work+0x83/0xb0 [mlx5_core]
         process_one_work+0x1de/0x3a0
         worker_thread+0x2d/0x3c0
       ? process_one_work+0x3a0/0x3a0
         kthread+0x115/0x130
       ? kthread_park+0x90/0x90
         ret_from_fork+0x1f/0x30
       --[ end trace 51ccabea504edaff ]---
       RIP: 0010:mlx5e_tx_reporter_dump_sq+0xd3/0x180
       PKRU: 55555554
       Kernel panic - not syncing: Fatal exception
       Kernel Offset: disabled
       end Kernel panic - not syncing: Fatal exception
      
      To fix this bug add a wrapper for mlx5e_tx_reporter_dump_sq() which
      extracts the sq from struct mlx5e_tx_timeout_ctx and set it as the
      TX-timeout-recovery flow dump callback.
      
      Fixes: 5f29458b ("net/mlx5e: Support dump callback in TX reporter")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Signed-off-by: default avatarAmir Tzin <amirtz@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      918fc385
    • Chris Mi's avatar
      net/mlx5: Fix tc max supported prio for nic mode · d671e109
      Chris Mi authored
      Only prio 1 is supported if firmware doesn't support ignore flow
      level for nic mode. The offending commit removed the check wrongly.
      Add it back.
      
      Fixes: 9a99c8f1 ("net/mlx5e: E-Switch, Offload all chain 0 priorities when modify header and forward action is not supported")
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      d671e109
    • Moshe Shemesh's avatar
      net/mlx5: Fix SF health recovery flow · 33de865f
      Moshe Shemesh authored
      SF do not directly control the PCI device. During recovery flow SF
      should not be allowed to do pci disable or pci reset, its PF will do it.
      
      It fixes the following kernel trace:
      mlx5_core.sf mlx5_core.sf.25: mlx5_health_try_recover:387:(pid 40948): starting health recovery flow
      mlx5_core 0000:03:00.0: mlx5_pci_slot_reset was called
      mlx5_core 0000:03:00.0: wait vital counter value 0xab175 after 1 iterations
      mlx5_core.sf mlx5_core.sf.25: firmware version: 24.32.532
      mlx5_core.sf mlx5_core.sf.23: mlx5_health_try_recover:387:(pid 40946): starting health recovery flow
      mlx5_core 0000:03:00.0: mlx5_pci_slot_reset was called
      mlx5_core 0000:03:00.0: wait vital counter value 0xab193 after 1 iterations
      mlx5_core.sf mlx5_core.sf.23: firmware version: 24.32.532
      mlx5_core.sf mlx5_core.sf.25: mlx5_cmd_check:813:(pid 40948): ENABLE_HCA(0x104) op_mod(0x0) failed,
      status bad resource state(0x9), syndrome (0x658908)
      mlx5_core.sf mlx5_core.sf.25: mlx5_function_setup:1292:(pid 40948): enable hca failed
      mlx5_core.sf mlx5_core.sf.25: mlx5_health_try_recover:389:(pid 40948): health recovery failed
      
      Fixes: 1958fc2f ("net/mlx5: SF, Add auxiliary device driver")
      Signed-off-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      33de865f
    • Shay Drory's avatar
      net/mlx5: Fix error print in case of IRQ request failed · aa968f92
      Shay Drory authored
      In case IRQ layer failed to find or to request irq, the driver is
      printing the first cpu of the provided affinity as part of the error
      print. Empty affinity is a valid input for the IRQ layer, and it is
      an error to call cpumask_first() on empty affinity.
      
      Remove the first cpu print from the error message.
      
      Fixes: c36326d3 ("net/mlx5: Round-Robin EQs over IRQs")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      aa968f92
    • Shay Drory's avatar
      net/mlx5: Use first online CPU instead of hard coded CPU · 26a7993c
      Shay Drory authored
      Hard coded CPU (0 in our case) might be offline. Hence, use the first
      online CPU instead.
      
      Fixes: f891b7cd ("net/mlx5: Enable single IRQ for PCI Function")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      26a7993c
    • Yevgeny Kliteynik's avatar
      net/mlx5: DR, Fix querying eswitch manager vport for ECPF · 624bf42c
      Yevgeny Kliteynik authored
      On BlueField the E-Switch manager is the ECPF (vport 0xFFFE), but when
      querying capabilities of ECPF eswitch manager, need to query vport 0
      with other_vport = 0.
      
      Fixes: 9091b821 ("net/mlx5: DR, Handle eswitch manager and uplink vports separately")
      Signed-off-by: default avatarYevgeny Kliteynik <kliteyn@nvidia.com>
      Reviewed-by: default avatarAlex Vesker <valex@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      624bf42c
    • Miaoqian Lin's avatar
      net/mlx5: DR, Fix NULL vs IS_ERR checking in dr_domain_init_resources · 6b8b4258
      Miaoqian Lin authored
      The mlx5_get_uars_page() function  returns error pointers.
      Using IS_ERR() to check the return value to fix this.
      
      Fixes: 4ec9e7b0 ("net/mlx5: DR, Expose steering domain functionality")
      Signed-off-by: default avatarMiaoqian Lin <linmq006@gmail.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      6b8b4258
  2. 22 Dec, 2021 8 commits
  3. 21 Dec, 2021 6 commits
    • Heiner Kallweit's avatar
      igb: fix deadlock caused by taking RTNL in RPM resume path · ac8c58f5
      Heiner Kallweit authored
      Recent net core changes caused an issue with few Intel drivers
      (reportedly igb), where taking RTNL in RPM resume path results in a
      deadlock. See [0] for a bug report. I don't think the core changes
      are wrong, but taking RTNL in RPM resume path isn't needed.
      The Intel drivers are the only ones doing this. See [1] for a
      discussion on the issue. Following patch changes the RPM resume path
      to not take RTNL.
      
      [0] https://bugzilla.kernel.org/show_bug.cgi?id=215129
      [1] https://lore.kernel.org/netdev/20211125074949.5f897431@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/t/
      
      Fixes: bd869245 ("net: core: try to runtime-resume detached device in __dev_open")
      Fixes: f32a2137 ("ethtool: runtime-resume netdev parent before ethtool ioctl ops")
      Tested-by: default avatarMartin Stolpe <martin.stolpe@gmail.com>
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Link: https://lore.kernel.org/r/20211220201844.2714498-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ac8c58f5
    • Jeroen de Borst's avatar
      gve: Correct order of processing device options · 1f06f7d9
      Jeroen de Borst authored
      The legacy raw addressing device option was processed before the
      new RDA queue format option.  This caused the supported features mask,
      which is provided only on the RDA queue format option, not to be set.
      
      This disabled jumbo-frame support when using raw adressing.
      
      Fixes: 255489f5 ("gve: Add a jumbo-frame device option")
      Signed-off-by: default avatarJeroen de Borst <jeroendb@google.com>
      Link: https://lore.kernel.org/r/20211220192746.2900594-1-jeroendb@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1f06f7d9
    • Willem de Bruijn's avatar
      net: skip virtio_net_hdr_set_proto if protocol already set · 1ed1d592
      Willem de Bruijn authored
      virtio_net_hdr_set_proto infers skb->protocol from the virtio_net_hdr
      gso_type, to avoid packets getting dropped for lack of a proto type.
      
      Its protocol choice is a guess, especially in the case of UFO, where
      the single VIRTIO_NET_HDR_GSO_UDP label covers both UFOv4 and UFOv6.
      
      Skip this best effort if the field is already initialized. Whether
      explicitly from userspace, or implicitly based on an earlier call to
      dev_parse_header_protocol (which is more robust, but was introduced
      after this patch).
      
      Fixes: 9d2f67e4 ("net/packet: fix packet drop as of virtio gso")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/r/20211220145027.2784293-1-willemdebruijn.kernel@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1ed1d592
    • Willem de Bruijn's avatar
      net: accept UFOv6 packages in virtio_net_hdr_to_skb · 7e5cced9
      Willem de Bruijn authored
      Skb with skb->protocol 0 at the time of virtio_net_hdr_to_skb may have
      a protocol inferred from virtio_net_hdr with virtio_net_hdr_set_proto.
      
      Unlike TCP, UDP does not have separate types for IPv4 and IPv6. Type
      VIRTIO_NET_HDR_GSO_UDP is guessed to be IPv4/UDP. As of the below
      commit, UFOv6 packets are dropped due to not matching the protocol as
      obtained from dev_parse_header_protocol.
      
      Invert the test to take that L2 protocol field as starting point and
      pass both UFOv4 and UFOv6 for VIRTIO_NET_HDR_GSO_UDP.
      
      Fixes: 924a9bc3 ("net: check if protocol extracted by virtio_net_hdr_set_proto is correct")
      Link: https://lore.kernel.org/netdev/CABcq3pG9GRCYqFDBAJ48H1vpnnX=41u+MhQnayF1ztLH4WX0Fw@mail.gmail.com/Reported-by: default avatarAndrew Melnichenko <andrew@daynix.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/r/20211220144901.2784030-1-willemdebruijn.kernel@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7e5cced9
    • Willem de Bruijn's avatar
      docs: networking: replace skb_hwtstamp_tx with skb_tstamp_tx · a9725e1d
      Willem de Bruijn authored
      Tiny doc fix. The hardware transmit function was called skb_tstamp_tx
      from its introduction in commit ac45f602 ("net: infrastructure for
      hardware time stamping") in the same series as this documentation.
      
      Fixes: cb9eff09 ("net: new user space API for time stamping of incoming and outgoing packets")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/r/20211220144608.2783526-1-willemdebruijn.kernel@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a9725e1d
    • Eric Dumazet's avatar
      inet: fully convert sk->sk_rx_dst to RCU rules · 8f905c0e
      Eric Dumazet authored
      syzbot reported various issues around early demux,
      one being included in this changelog [1]
      
      sk->sk_rx_dst is using RCU protection without clearly
      documenting it.
      
      And following sequences in tcp_v4_do_rcv()/tcp_v6_do_rcv()
      are not following standard RCU rules.
      
      [a]    dst_release(dst);
      [b]    sk->sk_rx_dst = NULL;
      
      They look wrong because a delete operation of RCU protected
      pointer is supposed to clear the pointer before
      the call_rcu()/synchronize_rcu() guarding actual memory freeing.
      
      In some cases indeed, dst could be freed before [b] is done.
      
      We could cheat by clearing sk_rx_dst before calling
      dst_release(), but this seems the right time to stick
      to standard RCU annotations and debugging facilities.
      
      [1]
      BUG: KASAN: use-after-free in dst_check include/net/dst.h:470 [inline]
      BUG: KASAN: use-after-free in tcp_v4_early_demux+0x95b/0x960 net/ipv4/tcp_ipv4.c:1792
      Read of size 2 at addr ffff88807f1cb73a by task syz-executor.5/9204
      
      CPU: 0 PID: 9204 Comm: syz-executor.5 Not tainted 5.16.0-rc5-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       print_address_description.constprop.0.cold+0x8d/0x320 mm/kasan/report.c:247
       __kasan_report mm/kasan/report.c:433 [inline]
       kasan_report.cold+0x83/0xdf mm/kasan/report.c:450
       dst_check include/net/dst.h:470 [inline]
       tcp_v4_early_demux+0x95b/0x960 net/ipv4/tcp_ipv4.c:1792
       ip_rcv_finish_core.constprop.0+0x15de/0x1e80 net/ipv4/ip_input.c:340
       ip_list_rcv_finish.constprop.0+0x1b2/0x6e0 net/ipv4/ip_input.c:583
       ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
       ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
       __netif_receive_skb_list_ptype net/core/dev.c:5508 [inline]
       __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5556
       __netif_receive_skb_list net/core/dev.c:5608 [inline]
       netif_receive_skb_list_internal+0x75e/0xd80 net/core/dev.c:5699
       gro_normal_list net/core/dev.c:5853 [inline]
       gro_normal_list net/core/dev.c:5849 [inline]
       napi_complete_done+0x1f1/0x880 net/core/dev.c:6590
       virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
       virtnet_poll+0xca2/0x11b0 drivers/net/virtio_net.c:1557
       __napi_poll+0xaf/0x440 net/core/dev.c:7023
       napi_poll net/core/dev.c:7090 [inline]
       net_rx_action+0x801/0xb40 net/core/dev.c:7177
       __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
       invoke_softirq kernel/softirq.c:432 [inline]
       __irq_exit_rcu+0x123/0x180 kernel/softirq.c:637
       irq_exit_rcu+0x5/0x20 kernel/softirq.c:649
       common_interrupt+0x52/0xc0 arch/x86/kernel/irq.c:240
       asm_common_interrupt+0x1e/0x40 arch/x86/include/asm/idtentry.h:629
      RIP: 0033:0x7f5e972bfd57
      Code: 39 d1 73 14 0f 1f 80 00 00 00 00 48 8b 50 f8 48 83 e8 08 48 39 ca 77 f3 48 39 c3 73 3e 48 89 13 48 8b 50 f8 48 89 38 49 8b 0e <48> 8b 3e 48 83 c3 08 48 83 c6 08 eb bc 48 39 d1 72 9e 48 39 d0 73
      RSP: 002b:00007fff8a413210 EFLAGS: 00000283
      RAX: 00007f5e97108990 RBX: 00007f5e97108338 RCX: ffffffff81d3aa45
      RDX: ffffffff81d3aa45 RSI: 00007f5e97108340 RDI: ffffffff81d3aa45
      RBP: 00007f5e97107eb8 R08: 00007f5e97108d88 R09: 0000000093c2e8d9
      R10: 0000000000000000 R11: 0000000000000000 R12: 00007f5e97107eb0
      R13: 00007f5e97108338 R14: 00007f5e97107ea8 R15: 0000000000000019
       </TASK>
      
      Allocated by task 13:
       kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
       kasan_set_track mm/kasan/common.c:46 [inline]
       set_alloc_info mm/kasan/common.c:434 [inline]
       __kasan_slab_alloc+0x90/0xc0 mm/kasan/common.c:467
       kasan_slab_alloc include/linux/kasan.h:259 [inline]
       slab_post_alloc_hook mm/slab.h:519 [inline]
       slab_alloc_node mm/slub.c:3234 [inline]
       slab_alloc mm/slub.c:3242 [inline]
       kmem_cache_alloc+0x202/0x3a0 mm/slub.c:3247
       dst_alloc+0x146/0x1f0 net/core/dst.c:92
       rt_dst_alloc+0x73/0x430 net/ipv4/route.c:1613
       ip_route_input_slow+0x1817/0x3a20 net/ipv4/route.c:2340
       ip_route_input_rcu net/ipv4/route.c:2470 [inline]
       ip_route_input_noref+0x116/0x2a0 net/ipv4/route.c:2415
       ip_rcv_finish_core.constprop.0+0x288/0x1e80 net/ipv4/ip_input.c:354
       ip_list_rcv_finish.constprop.0+0x1b2/0x6e0 net/ipv4/ip_input.c:583
       ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
       ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
       __netif_receive_skb_list_ptype net/core/dev.c:5508 [inline]
       __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5556
       __netif_receive_skb_list net/core/dev.c:5608 [inline]
       netif_receive_skb_list_internal+0x75e/0xd80 net/core/dev.c:5699
       gro_normal_list net/core/dev.c:5853 [inline]
       gro_normal_list net/core/dev.c:5849 [inline]
       napi_complete_done+0x1f1/0x880 net/core/dev.c:6590
       virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
       virtnet_poll+0xca2/0x11b0 drivers/net/virtio_net.c:1557
       __napi_poll+0xaf/0x440 net/core/dev.c:7023
       napi_poll net/core/dev.c:7090 [inline]
       net_rx_action+0x801/0xb40 net/core/dev.c:7177
       __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
      
      Freed by task 13:
       kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
       kasan_set_track+0x21/0x30 mm/kasan/common.c:46
       kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
       ____kasan_slab_free mm/kasan/common.c:366 [inline]
       ____kasan_slab_free mm/kasan/common.c:328 [inline]
       __kasan_slab_free+0xff/0x130 mm/kasan/common.c:374
       kasan_slab_free include/linux/kasan.h:235 [inline]
       slab_free_hook mm/slub.c:1723 [inline]
       slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1749
       slab_free mm/slub.c:3513 [inline]
       kmem_cache_free+0xbd/0x5d0 mm/slub.c:3530
       dst_destroy+0x2d6/0x3f0 net/core/dst.c:127
       rcu_do_batch kernel/rcu/tree.c:2506 [inline]
       rcu_core+0x7ab/0x1470 kernel/rcu/tree.c:2741
       __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
      
      Last potentially related work creation:
       kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
       __kasan_record_aux_stack+0xf5/0x120 mm/kasan/generic.c:348
       __call_rcu kernel/rcu/tree.c:2985 [inline]
       call_rcu+0xb1/0x740 kernel/rcu/tree.c:3065
       dst_release net/core/dst.c:177 [inline]
       dst_release+0x79/0xe0 net/core/dst.c:167
       tcp_v4_do_rcv+0x612/0x8d0 net/ipv4/tcp_ipv4.c:1712
       sk_backlog_rcv include/net/sock.h:1030 [inline]
       __release_sock+0x134/0x3b0 net/core/sock.c:2768
       release_sock+0x54/0x1b0 net/core/sock.c:3300
       tcp_sendmsg+0x36/0x40 net/ipv4/tcp.c:1441
       inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:724
       sock_write_iter+0x289/0x3c0 net/socket.c:1057
       call_write_iter include/linux/fs.h:2162 [inline]
       new_sync_write+0x429/0x660 fs/read_write.c:503
       vfs_write+0x7cd/0xae0 fs/read_write.c:590
       ksys_write+0x1ee/0x250 fs/read_write.c:643
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      The buggy address belongs to the object at ffff88807f1cb700
       which belongs to the cache ip_dst_cache of size 176
      The buggy address is located 58 bytes inside of
       176-byte region [ffff88807f1cb700, ffff88807f1cb7b0)
      The buggy address belongs to the page:
      page:ffffea0001fc72c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7f1cb
      flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
      raw: 00fff00000000200 dead000000000100 dead000000000122 ffff8881413bb780
      raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
      page last allocated via order 0, migratetype Unmovable, gfp_mask 0x112a20(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL), pid 5, ts 108466983062, free_ts 108048976062
       prep_new_page mm/page_alloc.c:2418 [inline]
       get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4149
       __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5369
       alloc_pages+0x1a7/0x300 mm/mempolicy.c:2191
       alloc_slab_page mm/slub.c:1793 [inline]
       allocate_slab mm/slub.c:1930 [inline]
       new_slab+0x32d/0x4a0 mm/slub.c:1993
       ___slab_alloc+0x918/0xfe0 mm/slub.c:3022
       __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3109
       slab_alloc_node mm/slub.c:3200 [inline]
       slab_alloc mm/slub.c:3242 [inline]
       kmem_cache_alloc+0x35c/0x3a0 mm/slub.c:3247
       dst_alloc+0x146/0x1f0 net/core/dst.c:92
       rt_dst_alloc+0x73/0x430 net/ipv4/route.c:1613
       __mkroute_output net/ipv4/route.c:2564 [inline]
       ip_route_output_key_hash_rcu+0x921/0x2d00 net/ipv4/route.c:2791
       ip_route_output_key_hash+0x18b/0x300 net/ipv4/route.c:2619
       __ip_route_output_key include/net/route.h:126 [inline]
       ip_route_output_flow+0x23/0x150 net/ipv4/route.c:2850
       ip_route_output_key include/net/route.h:142 [inline]
       geneve_get_v4_rt+0x3a6/0x830 drivers/net/geneve.c:809
       geneve_xmit_skb drivers/net/geneve.c:899 [inline]
       geneve_xmit+0xc4a/0x3540 drivers/net/geneve.c:1082
       __netdev_start_xmit include/linux/netdevice.h:4994 [inline]
       netdev_start_xmit include/linux/netdevice.h:5008 [inline]
       xmit_one net/core/dev.c:3590 [inline]
       dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3606
       __dev_queue_xmit+0x299a/0x3650 net/core/dev.c:4229
      page last free stack trace:
       reset_page_owner include/linux/page_owner.h:24 [inline]
       free_pages_prepare mm/page_alloc.c:1338 [inline]
       free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1389
       free_unref_page_prepare mm/page_alloc.c:3309 [inline]
       free_unref_page+0x19/0x690 mm/page_alloc.c:3388
       qlink_free mm/kasan/quarantine.c:146 [inline]
       qlist_free_all+0x5a/0xc0 mm/kasan/quarantine.c:165
       kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:272
       __kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:444
       kasan_slab_alloc include/linux/kasan.h:259 [inline]
       slab_post_alloc_hook mm/slab.h:519 [inline]
       slab_alloc_node mm/slub.c:3234 [inline]
       kmem_cache_alloc_node+0x255/0x3f0 mm/slub.c:3270
       __alloc_skb+0x215/0x340 net/core/skbuff.c:414
       alloc_skb include/linux/skbuff.h:1126 [inline]
       alloc_skb_with_frags+0x93/0x620 net/core/skbuff.c:6078
       sock_alloc_send_pskb+0x783/0x910 net/core/sock.c:2575
       mld_newpack+0x1df/0x770 net/ipv6/mcast.c:1754
       add_grhead+0x265/0x330 net/ipv6/mcast.c:1857
       add_grec+0x1053/0x14e0 net/ipv6/mcast.c:1995
       mld_send_initial_cr.part.0+0xf6/0x230 net/ipv6/mcast.c:2242
       mld_send_initial_cr net/ipv6/mcast.c:1232 [inline]
       mld_dad_work+0x1d3/0x690 net/ipv6/mcast.c:2268
       process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298
       worker_thread+0x658/0x11f0 kernel/workqueue.c:2445
      
      Memory state around the buggy address:
       ffff88807f1cb600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff88807f1cb680: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
      >ffff88807f1cb700: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                              ^
       ffff88807f1cb780: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
       ffff88807f1cb800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      Fixes: 41063e9d ("ipv4: Early TCP socket demux.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20211220143330.680945-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8f905c0e
  4. 20 Dec, 2021 3 commits
  5. 18 Dec, 2021 13 commits
    • Jiasheng Jiang's avatar
      qlcnic: potential dereference null pointer of rx_queue->page_ring · 60ec7fcf
      Jiasheng Jiang authored
      The return value of kcalloc() needs to be checked.
      To avoid dereference of null pointer in case of the failure of alloc.
      Therefore, it might be better to change the return type of
      qlcnic_sriov_alloc_vlans() and return -ENOMEM when alloc fails and
      return 0 the others.
      Also, qlcnic_sriov_set_guest_vlan_mode() and __qlcnic_pci_sriov_enable()
      should deal with the return value of qlcnic_sriov_alloc_vlans().
      
      Fixes: 154d0c81 ("qlcnic: VLAN enhancement for 84XX adapters")
      Signed-off-by: default avatarJiasheng Jiang <jiasheng@iscas.ac.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      60ec7fcf
    • Lin Ma's avatar
      ax25: NPD bug when detaching AX25 device · 1ade48d0
      Lin Ma authored
      The existing cleanup routine implementation is not well synchronized
      with the syscall routine. When a device is detaching, below race could
      occur.
      
      static int ax25_sendmsg(...) {
        ...
        lock_sock()
        ax25 = sk_to_ax25(sk);
        if (ax25->ax25_dev == NULL) // CHECK
        ...
        ax25_queue_xmit(skb, ax25->ax25_dev->dev); // USE
        ...
      }
      
      static void ax25_kill_by_device(...) {
        ...
        if (s->ax25_dev == ax25_dev) {
          s->ax25_dev = NULL;
          ...
      }
      
      Other syscall functions like ax25_getsockopt, ax25_getname,
      ax25_info_show also suffer from similar races. To fix them, this patch
      introduce lock_sock() into ax25_kill_by_device in order to guarantee
      that the nullify action in cleanup routine cannot proceed when another
      socket request is pending.
      Signed-off-by: default avatarHanjie Wu <nagi@zju.edu.cn>
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ade48d0
    • Lin Ma's avatar
      hamradio: improve the incomplete fix to avoid NPD · b2f37aea
      Lin Ma authored
      The previous commit 3e0588c2 ("hamradio: defer ax25 kfree after
      unregister_netdev") reorder the kfree operations and unregister_netdev
      operation to prevent UAF.
      
      This commit improves the previous one by also deferring the nullify of
      the ax->tty pointer. Otherwise, a NULL pointer dereference bug occurs.
      Partial of the stack trace is shown below.
      
      BUG: kernel NULL pointer dereference, address: 0000000000000538
      RIP: 0010:ax_xmit+0x1f9/0x400
      ...
      Call Trace:
       dev_hard_start_xmit+0xec/0x320
       sch_direct_xmit+0xea/0x240
       __qdisc_run+0x166/0x5c0
       __dev_queue_xmit+0x2c7/0xaf0
       ax25_std_establish_data_link+0x59/0x60
       ax25_connect+0x3a0/0x500
       ? security_socket_connect+0x2b/0x40
       __sys_connect+0x96/0xc0
       ? __hrtimer_init+0xc0/0xc0
       ? common_nsleep+0x2e/0x50
       ? switch_fpu_return+0x139/0x1a0
       __x64_sys_connect+0x11/0x20
       do_syscall_64+0x33/0x40
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The crash point is shown as below
      
      static void ax_encaps(...) {
        ...
        set_bit(TTY_DO_WRITE_WAKEUP, &ax->tty->flags); // ax->tty = NULL!
        ...
      }
      
      By placing the nullify action after the unregister_netdev, the ax->tty
      pointer won't be assigned as NULL net_device framework layer is well
      synchronized.
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b2f37aea
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · aa3cc8a9
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2021-12-17
      
      Maciej Fijalkowski says:
      
      It seems that previous [0] Rx fix was not enough and there are still
      issues with AF_XDP Rx ZC support in ice driver. Elza reported that for
      multiple XSK sockets configured on a single netdev, some of them were
      becoming dead after a while. We have spotted more things that needed to
      be addressed this time. More of information can be found in particular
      commit messages.
      
      It also carries Alexandr's patch that was sent previously which was
      overlapping with this set.
      
      [0]: https://lore.kernel.org/bpf/20211129231746.2767739-1-anthony.l.nguyen@intel.com/
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa3cc8a9
    • George Kennedy's avatar
      tun: avoid double free in tun_free_netdev · 158b515f
      George Kennedy authored
      Avoid double free in tun_free_netdev() by moving the
      dev->tstats and tun->security allocs to a new ndo_init routine
      (tun_net_init()) that will be called by register_netdevice().
      ndo_init is paired with the desctructor (tun_free_netdev()),
      so if there's an error in register_netdevice() the destructor
      will handle the frees.
      
      BUG: KASAN: double-free or invalid-free in selinux_tun_dev_free_security+0x1a/0x20 security/selinux/hooks.c:5605
      
      CPU: 0 PID: 25750 Comm: syz-executor416 Not tainted 5.16.0-rc2-syzk #1
      Hardware name: Red Hat KVM, BIOS
      Call Trace:
      <TASK>
      __dump_stack lib/dump_stack.c:88 [inline]
      dump_stack_lvl+0x89/0xb5 lib/dump_stack.c:106
      print_address_description.constprop.9+0x28/0x160 mm/kasan/report.c:247
      kasan_report_invalid_free+0x55/0x80 mm/kasan/report.c:372
      ____kasan_slab_free mm/kasan/common.c:346 [inline]
      __kasan_slab_free+0x107/0x120 mm/kasan/common.c:374
      kasan_slab_free include/linux/kasan.h:235 [inline]
      slab_free_hook mm/slub.c:1723 [inline]
      slab_free_freelist_hook mm/slub.c:1749 [inline]
      slab_free mm/slub.c:3513 [inline]
      kfree+0xac/0x2d0 mm/slub.c:4561
      selinux_tun_dev_free_security+0x1a/0x20 security/selinux/hooks.c:5605
      security_tun_dev_free_security+0x4f/0x90 security/security.c:2342
      tun_free_netdev+0xe6/0x150 drivers/net/tun.c:2215
      netdev_run_todo+0x4df/0x840 net/core/dev.c:10627
      rtnl_unlock+0x13/0x20 net/core/rtnetlink.c:112
      __tun_chr_ioctl+0x80c/0x2870 drivers/net/tun.c:3302
      tun_chr_ioctl+0x2f/0x40 drivers/net/tun.c:3311
      vfs_ioctl fs/ioctl.c:51 [inline]
      __do_sys_ioctl fs/ioctl.c:874 [inline]
      __se_sys_ioctl fs/ioctl.c:860 [inline]
      __x64_sys_ioctl+0x19d/0x220 fs/ioctl.c:860
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x3a/0x80 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x44/0xae
      Reported-by: default avatarsyzkaller <syzkaller@googlegroups.com>
      Signed-off-by: default avatarGeorge Kennedy <george.kennedy@oracle.com>
      Suggested-by: default avatarJakub Kicinski <kuba@kernel.org>
      Link: https://lore.kernel.org/r/1639679132-19884-1-git-send-email-george.kennedy@oracle.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      158b515f
    • Yevhen Orlov's avatar
      net: marvell: prestera: fix incorrect structure access · 2efc2256
      Yevhen Orlov authored
      In line:
      	upper = info->upper_dev;
      We access upper_dev field, which is related only for particular events
      (e.g. event == NETDEV_CHANGEUPPER). So, this line cause invalid memory
      access for another events,
      when ptr is not netdev_notifier_changeupper_info.
      
      The KASAN logs are as follows:
      
      [   30.123165] BUG: KASAN: stack-out-of-bounds in prestera_netdev_port_event.constprop.0+0x68/0x538 [prestera]
      [   30.133336] Read of size 8 at addr ffff80000cf772b0 by task udevd/778
      [   30.139866]
      [   30.141398] CPU: 0 PID: 778 Comm: udevd Not tainted 5.16.0-rc3 #6
      [   30.147588] Hardware name: DNI AmazonGo1 A7040 board (DT)
      [   30.153056] Call trace:
      [   30.155547]  dump_backtrace+0x0/0x2c0
      [   30.159320]  show_stack+0x18/0x30
      [   30.162729]  dump_stack_lvl+0x68/0x84
      [   30.166491]  print_address_description.constprop.0+0x74/0x2b8
      [   30.172346]  kasan_report+0x1e8/0x250
      [   30.176102]  __asan_load8+0x98/0xe0
      [   30.179682]  prestera_netdev_port_event.constprop.0+0x68/0x538 [prestera]
      [   30.186847]  prestera_netdev_event_handler+0x1b4/0x1c0 [prestera]
      [   30.193313]  raw_notifier_call_chain+0x74/0xa0
      [   30.197860]  call_netdevice_notifiers_info+0x68/0xc0
      [   30.202924]  register_netdevice+0x3cc/0x760
      [   30.207190]  register_netdev+0x24/0x50
      [   30.211015]  prestera_device_register+0x8a0/0xba0 [prestera]
      
      Fixes: 3d5048cc ("net: marvell: prestera: move netdev topology validation to prestera_main")
      Signed-off-by: default avatarYevhen Orlov <yevhen.orlov@plvision.eu>
      Link: https://lore.kernel.org/r/20211216171714.11341-1-yevhen.orlov@plvision.euSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2efc2256
    • Yevhen Orlov's avatar
      net: marvell: prestera: fix incorrect return of port_find · 8b681bd7
      Yevhen Orlov authored
      In case, when some ports is in list and we don't find requested - we
      return last iterator state and not return NULL as expected.
      
      Fixes: 501ef306 ("net: marvell: prestera: Add driver for Prestera family ASIC devices")
      Signed-off-by: default avatarYevhen Orlov <yevhen.orlov@plvision.eu>
      Link: https://lore.kernel.org/r/20211216170736.8851-1-yevhen.orlov@plvision.euSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8b681bd7
    • Hoang Le's avatar
      Revert "tipc: use consistent GFP flags" · f845fe58
      Hoang Le authored
      This reverts commit 86c3a3e9.
      
      The tipc_aead_init() function can be calling from an interrupt routine.
      This allocation might sleep with GFP_KERNEL flag, hence the following BUG
      is reported.
      
      [   17.657509] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:230
      [   17.660916] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 0, name: swapper/3
      [   17.664093] preempt_count: 302, expected: 0
      [   17.665619] RCU nest depth: 2, expected: 0
      [   17.667163] Preemption disabled at:
      [   17.667165] [<0000000000000000>] 0x0
      [   17.669753] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Tainted: G        W         5.16.0-rc4+ #1
      [   17.673006] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
      [   17.675540] Call Trace:
      [   17.676285]  <IRQ>
      [   17.676913]  dump_stack_lvl+0x34/0x44
      [   17.678033]  __might_resched.cold+0xd6/0x10f
      [   17.679311]  kmem_cache_alloc_trace+0x14d/0x220
      [   17.680663]  tipc_crypto_start+0x4a/0x2b0 [tipc]
      [   17.682146]  ? kmem_cache_alloc_trace+0xd3/0x220
      [   17.683545]  tipc_node_create+0x2f0/0x790 [tipc]
      [   17.684956]  tipc_node_check_dest+0x72/0x680 [tipc]
      [   17.686706]  ? ___cache_free+0x31/0x350
      [   17.688008]  ? skb_release_data+0x128/0x140
      [   17.689431]  tipc_disc_rcv+0x479/0x510 [tipc]
      [   17.690904]  tipc_rcv+0x71c/0x730 [tipc]
      [   17.692219]  ? __netif_receive_skb_core+0xb7/0xf60
      [   17.693856]  tipc_l2_rcv_msg+0x5e/0x90 [tipc]
      [   17.695333]  __netif_receive_skb_list_core+0x20b/0x260
      [   17.697072]  netif_receive_skb_list_internal+0x1bf/0x2e0
      [   17.698870]  ? dev_gro_receive+0x4c2/0x680
      [   17.700255]  napi_complete_done+0x6f/0x180
      [   17.701657]  virtnet_poll+0x29c/0x42e [virtio_net]
      [   17.703262]  __napi_poll+0x2c/0x170
      [   17.704429]  net_rx_action+0x22f/0x280
      [   17.705706]  __do_softirq+0xfd/0x30a
      [   17.706921]  common_interrupt+0xa4/0xc0
      [   17.708206]  </IRQ>
      [   17.708922]  <TASK>
      [   17.709651]  asm_common_interrupt+0x1e/0x40
      [   17.711078] RIP: 0010:default_idle+0x18/0x20
      
      Fixes: 86c3a3e9 ("tipc: use consistent GFP flags")
      Acked-by: default avatarJon Maloy <jmaloy@redhat.com>
      Signed-off-by: default avatarHoang Le <hoang.h.le@dektech.com.au>
      Link: https://lore.kernel.org/r/20211217030059.5947-1-hoang.h.le@dektech.com.auSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f845fe58
    • Aleksander Jan Bajkowski's avatar
      net: lantiq_xrx200: increase buffer reservation · 1488fc20
      Aleksander Jan Bajkowski authored
      If the user sets a lower mtu on the CPU port than on the switch,
      then DMA inserts a few more bytes into the buffer than expected.
      In the worst case, it may exceed the size of the buffer. The
      experiments showed that the buffer should be a multiple of the
      burst length value. This patch rounds the length of the rx buffer
      upwards and fixes this bug. The reservation of FCS space in the
      buffer has been removed as PMAC strips the FCS.
      
      Fixes: 998ac358 ("net: lantiq: add support for jumbo frames")
      Reported-by: default avatarThomas Nixon <tom@tomn.co.uk>
      Signed-off-by: default avatarAleksander Jan Bajkowski <olek2@wp.pl>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1488fc20
    • Jakub Kicinski's avatar
      Merge branch 'net-sched-fix-ct-zone-matching-for-invalid-conntrack-state' · 14193d57
      Jakub Kicinski authored
      Paul Blakey says:
      
      ====================
      net/sched: Fix ct zone matching for invalid conntrack state
      
      Currently, when a packet is marked as invalid conntrack_in in act_ct,
      post_ct will be set, and connection info (nf_conn) will be removed
      from the skb. Later openvswitch and flower matching will parse this
      as ct_state=+trk+inv. But because the connection info is missing,
      there is also no zone info to match against even though the packet
      is tracked.
      
      This series fixes that, by passing the last executed zone by act_ct.
      The zone info is passed along from act_ct to the ct flow dissector
      (used by flower to extract zone info) and to ovs, the same way as post_ct
      is passed, via qdisc layer skb cb to dissector, and via skb extension
      to OVS.
      
      Since adding any more data to qdisc skb cb, there will be no room
      for BPF skb cb to extend it and stay under skb->cb size, this series
      moves the tc related info from within qdisc skb cb to a tc specific cb
      that also extends it.
      ====================
      
      Link: https://lore.kernel.org/r/20211214172435.24207-1-paulb@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      14193d57
    • Paul Blakey's avatar
      net: openvswitch: Fix matching zone id for invalid conns arriving from tc · 635d448a
      Paul Blakey authored
      Zone id is not restored if we passed ct and ct rejected the connection,
      as there is no ct info on the skb.
      
      Save the zone from tc skb cb to tc skb extension and pass it on to
      ovs, use that info to restore the zone id for invalid connections.
      
      Fixes: d29334c1 ("net/sched: act_api: fix miss set post_ct for ovs after do conntrack in act_ct")
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      635d448a
    • Paul Blakey's avatar
      net/sched: flow_dissector: Fix matching on zone id for invalid conns · 38495958
      Paul Blakey authored
      If ct rejects a flow, it removes the conntrack info from the skb.
      act_ct sets the post_ct variable so the dissector will see this case
      as an +tracked +invalid state, but the zone id is lost with the
      conntrack info.
      
      To restore the zone id on such cases, set the last executed zone,
      via the tc control block, when passing ct, and read it back in the
      dissector if there is no ct info on the skb (invalid connection).
      
      Fixes: 7baf2429 ("net/sched: cls_flower add CT_FLAGS_INVALID flag support")
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      38495958
    • Paul Blakey's avatar
      net/sched: Extend qdisc control block with tc control block · ec624fe7
      Paul Blakey authored
      BPF layer extends the qdisc control block via struct bpf_skb_data_end
      and because of that there is no more room to add variables to the
      qdisc layer control block without going over the skb->cb size.
      
      Extend the qdisc control block with a tc control block,
      and move all tc related variables to there as a pre-step for
      extending the tc control block with additional members.
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ec624fe7
  6. 17 Dec, 2021 3 commits