1. 11 Apr, 2024 7 commits
    • Pablo Neira Ayuso's avatar
      netfilter: flowtable: incorrect pppoe tuple · 6db5dc7b
      Pablo Neira Ayuso authored
      pppoe traffic reaching ingress path does not match the flowtable entry
      because the pppoe header is expected to be at the network header offset.
      This bug causes a mismatch in the flow table lookup, so pppoe packets
      enter the classical forwarding path.
      
      Fixes: 72efd585 ("netfilter: flowtable: add pppoe support")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      6db5dc7b
    • Pablo Neira Ayuso's avatar
      netfilter: flowtable: validate pppoe header · 87b3593b
      Pablo Neira Ayuso authored
      Ensure there is sufficient room to access the protocol field of the
      PPPoe header. Validate it once before the flowtable lookup, then use a
      helper function to access protocol field.
      
      Reported-by: syzbot+b6f07e1c07ef40199081@syzkaller.appspotmail.com
      Fixes: 72efd585 ("netfilter: flowtable: add pppoe support")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      87b3593b
    • Florian Westphal's avatar
      netfilter: nft_set_pipapo: do not free live element · 3cfc9ec0
      Florian Westphal authored
      Pablo reports a crash with large batches of elements with a
      back-to-back add/remove pattern.  Quoting Pablo:
      
        add_elem("00000000") timeout 100 ms
        ...
        add_elem("0000000X") timeout 100 ms
        del_elem("0000000X") <---------------- delete one that was just added
        ...
        add_elem("00005000") timeout 100 ms
      
        1) nft_pipapo_remove() removes element 0000000X
        Then, KASAN shows a splat.
      
      Looking at the remove function there is a chance that we will drop a
      rule that maps to a non-deactivated element.
      
      Removal happens in two steps, first we do a lookup for key k and return the
      to-be-removed element and mark it as inactive in the next generation.
      Then, in a second step, the element gets removed from the set/map.
      
      The _remove function does not work correctly if we have more than one
      element that share the same key.
      
      This can happen if we insert an element into a set when the set already
      holds an element with same key, but the element mapping to the existing
      key has timed out or is not active in the next generation.
      
      In such case its possible that removal will unmap the wrong element.
      If this happens, we will leak the non-deactivated element, it becomes
      unreachable.
      
      The element that got deactivated (and will be freed later) will
      remain reachable in the set data structure, this can result in
      a crash when such an element is retrieved during lookup (stale
      pointer).
      
      Add a check that the fully matching key does in fact map to the element
      that we have marked as inactive in the deactivation step.
      If not, we need to continue searching.
      
      Add a bug/warn trap at the end of the function as well, the remove
      function must not ever be called with an invisible/unreachable/non-existent
      element.
      
      v2: avoid uneeded temporary variable (Stefano)
      
      Fixes: 3c4287f6 ("nf_tables: Add set type for arbitrary concatenation of ranges")
      Reported-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      3cfc9ec0
    • Pablo Neira Ayuso's avatar
      netfilter: nft_set_pipapo: walk over current view on netlink dump · 29b359cf
      Pablo Neira Ayuso authored
      The generation mask can be updated while netlink dump is in progress.
      The pipapo set backend walk iterator cannot rely on it to infer what
      view of the datastructure is to be used. Add notation to specify if user
      wants to read/update the set.
      
      Based on patch from Florian Westphal.
      
      Fixes: 2b84e215 ("netfilter: nft_set_pipapo: .walk does not deal with generations")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      29b359cf
    • Pablo Neira Ayuso's avatar
      netfilter: br_netfilter: skip conntrack input hook for promisc packets · 751de201
      Pablo Neira Ayuso authored
      For historical reasons, when bridge device is in promisc mode, packets
      that are directed to the taps follow bridge input hook path. This patch
      adds a workaround to reset conntrack for these packets.
      
      Jianbo Liu reports warning splats in their test infrastructure where
      cloned packets reach the br_netfilter input hook to confirm the
      conntrack object.
      
      Scratch one bit from BR_INPUT_SKB_CB to annotate that this packet has
      reached the input hook because it is passed up to the bridge device to
      reach the taps.
      
      [   57.571874] WARNING: CPU: 1 PID: 0 at net/bridge/br_netfilter_hooks.c:616 br_nf_local_in+0x157/0x180 [br_netfilter]
      [   57.572749] Modules linked in: xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype xt_conntrack nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_isc si ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core mlx5ctl mlx5_core
      [   57.575158] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.8.0+ #19
      [   57.575700] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      [   57.576662] RIP: 0010:br_nf_local_in+0x157/0x180 [br_netfilter]
      [   57.577195] Code: fe ff ff 41 bd 04 00 00 00 be 04 00 00 00 e9 4a ff ff ff be 04 00 00 00 48 89 ef e8 f3 a9 3c e1 66 83 ad b4 00 00 00 04 eb 91 <0f> 0b e9 f1 fe ff ff 0f 0b e9 df fe ff ff 48 89 df e8 b3 53 47 e1
      [   57.578722] RSP: 0018:ffff88885f845a08 EFLAGS: 00010202
      [   57.579207] RAX: 0000000000000002 RBX: ffff88812dfe8000 RCX: 0000000000000000
      [   57.579830] RDX: ffff88885f845a60 RSI: ffff8881022dc300 RDI: 0000000000000000
      [   57.580454] RBP: ffff88885f845a60 R08: 0000000000000001 R09: 0000000000000003
      [   57.581076] R10: 00000000ffff1300 R11: 0000000000000002 R12: 0000000000000000
      [   57.581695] R13: ffff8881047ffe00 R14: ffff888108dbee00 R15: ffff88814519b800
      [   57.582313] FS:  0000000000000000(0000) GS:ffff88885f840000(0000) knlGS:0000000000000000
      [   57.583040] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   57.583564] CR2: 000000c4206aa000 CR3: 0000000103847001 CR4: 0000000000370eb0
      [   57.584194] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
      0000000000000000
      [   57.584820] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
      0000000000000400
      [   57.585440] Call Trace:
      [   57.585721]  <IRQ>
      [   57.585976]  ? __warn+0x7d/0x130
      [   57.586323]  ? br_nf_local_in+0x157/0x180 [br_netfilter]
      [   57.586811]  ? report_bug+0xf1/0x1c0
      [   57.587177]  ? handle_bug+0x3f/0x70
      [   57.587539]  ? exc_invalid_op+0x13/0x60
      [   57.587929]  ? asm_exc_invalid_op+0x16/0x20
      [   57.588336]  ? br_nf_local_in+0x157/0x180 [br_netfilter]
      [   57.588825]  nf_hook_slow+0x3d/0xd0
      [   57.589188]  ? br_handle_vlan+0x4b/0x110
      [   57.589579]  br_pass_frame_up+0xfc/0x150
      [   57.589970]  ? br_port_flags_change+0x40/0x40
      [   57.590396]  br_handle_frame_finish+0x346/0x5e0
      [   57.590837]  ? ipt_do_table+0x32e/0x430
      [   57.591221]  ? br_handle_local_finish+0x20/0x20
      [   57.591656]  br_nf_hook_thresh+0x4b/0xf0 [br_netfilter]
      [   57.592286]  ? br_handle_local_finish+0x20/0x20
      [   57.592802]  br_nf_pre_routing_finish+0x178/0x480 [br_netfilter]
      [   57.593348]  ? br_handle_local_finish+0x20/0x20
      [   57.593782]  ? nf_nat_ipv4_pre_routing+0x25/0x60 [nf_nat]
      [   57.594279]  br_nf_pre_routing+0x24c/0x550 [br_netfilter]
      [   57.594780]  ? br_nf_hook_thresh+0xf0/0xf0 [br_netfilter]
      [   57.595280]  br_handle_frame+0x1f3/0x3d0
      [   57.595676]  ? br_handle_local_finish+0x20/0x20
      [   57.596118]  ? br_handle_frame_finish+0x5e0/0x5e0
      [   57.596566]  __netif_receive_skb_core+0x25b/0xfc0
      [   57.597017]  ? __napi_build_skb+0x37/0x40
      [   57.597418]  __netif_receive_skb_list_core+0xfb/0x220
      
      Fixes: 62e7151a ("netfilter: bridge: confirm multicast packets before passing them up the stack")
      Reported-by: default avatarJianbo Liu <jianbol@nvidia.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      751de201
    • Ziyang Xuan's avatar
      netfilter: nf_tables: Fix potential data-race in __nft_obj_type_get() · d78d867d
      Ziyang Xuan authored
      nft_unregister_obj() can concurrent with __nft_obj_type_get(),
      and there is not any protection when iterate over nf_tables_objects
      list in __nft_obj_type_get(). Therefore, there is potential data-race
      of nf_tables_objects list entry.
      
      Use list_for_each_entry_rcu() to iterate over nf_tables_objects
      list in __nft_obj_type_get(), and use rcu_read_lock() in the caller
      nft_obj_type_get() to protect the entire type query process.
      
      Fixes: e5009240 ("netfilter: nf_tables: add stateful objects")
      Signed-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      d78d867d
    • Ziyang Xuan's avatar
      netfilter: nf_tables: Fix potential data-race in __nft_expr_type_get() · f969eb84
      Ziyang Xuan authored
      nft_unregister_expr() can concurrent with __nft_expr_type_get(),
      and there is not any protection when iterate over nf_tables_expressions
      list in __nft_expr_type_get(). Therefore, there is potential data-race
      of nf_tables_expressions list entry.
      
      Use list_for_each_entry_rcu() to iterate over nf_tables_expressions
      list in __nft_expr_type_get(), and use rcu_read_lock() in the caller
      nft_expr_type_get() to protect the entire type query process.
      
      Fixes: ef1f7df9 ("netfilter: nf_tables: expression ops overloading")
      Signed-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      f969eb84
  2. 10 Apr, 2024 7 commits
    • Heiner Kallweit's avatar
      r8169: fix LED-related deadlock on module removal · 19fa4f2a
      Heiner Kallweit authored
      Binding devm_led_classdev_register() to the netdev is problematic
      because on module removal we get a RTNL-related deadlock. Fix this
      by avoiding the device-managed LED functions.
      
      Note: We can safely call led_classdev_unregister() for a LED even
      if registering it failed, because led_classdev_unregister() detects
      this and is a no-op in this case.
      
      Fixes: 18764b88 ("r8169: add support for LED's on RTL8168/RTL8101")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      19fa4f2a
    • Brett Creeley's avatar
      pds_core: Fix pdsc_check_pci_health function to use work thread · 81665adf
      Brett Creeley authored
      When the driver notices fw_status == 0xff it tries to perform a PCI
      reset on itself via pci_reset_function() in the context of the driver's
      health thread. However, pdsc_reset_prepare calls
      pdsc_stop_health_thread(), which attempts to stop/flush the health
      thread. This results in a deadlock because the stop/flush will never
      complete since the driver called pci_reset_function() from the health
      thread context. Fix by changing the pdsc_check_pci_health_function()
      to queue a newly introduced pdsc_pci_reset_thread() on the pdsc's
      work queue.
      
      Unloading the driver in the fw_down/dead state uncovered another issue,
      which can be seen in the following trace:
      
      WARNING: CPU: 51 PID: 6914 at kernel/workqueue.c:1450 __queue_work+0x358/0x440
      [...]
      RIP: 0010:__queue_work+0x358/0x440
      [...]
      Call Trace:
       <TASK>
       ? __warn+0x85/0x140
       ? __queue_work+0x358/0x440
       ? report_bug+0xfc/0x1e0
       ? handle_bug+0x3f/0x70
       ? exc_invalid_op+0x17/0x70
       ? asm_exc_invalid_op+0x1a/0x20
       ? __queue_work+0x358/0x440
       queue_work_on+0x28/0x30
       pdsc_devcmd_locked+0x96/0xe0 [pds_core]
       pdsc_devcmd_reset+0x71/0xb0 [pds_core]
       pdsc_teardown+0x51/0xe0 [pds_core]
       pdsc_remove+0x106/0x200 [pds_core]
       pci_device_remove+0x37/0xc0
       device_release_driver_internal+0xae/0x140
       driver_detach+0x48/0x90
       bus_remove_driver+0x6d/0xf0
       pci_unregister_driver+0x2e/0xa0
       pdsc_cleanup_module+0x10/0x780 [pds_core]
       __x64_sys_delete_module+0x142/0x2b0
       ? syscall_trace_enter.isra.18+0x126/0x1a0
       do_syscall_64+0x3b/0x90
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      RIP: 0033:0x7fbd9d03a14b
      [...]
      
      Fix this by preventing the devcmd reset if the FW is not running.
      
      Fixes: d9407ff1 ("pds_core: Prevent health thread from running during reset/remove")
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Signed-off-by: default avatarBrett Creeley <brett.creeley@amd.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      81665adf
    • Jiri Benc's avatar
      ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr · 7633c4da
      Jiri Benc authored
      Although ipv6_get_ifaddr walks inet6_addr_lst under the RCU lock, it
      still means hlist_for_each_entry_rcu can return an item that got removed
      from the list. The memory itself of such item is not freed thanks to RCU
      but nothing guarantees the actual content of the memory is sane.
      
      In particular, the reference count can be zero. This can happen if
      ipv6_del_addr is called in parallel. ipv6_del_addr removes the entry
      from inet6_addr_lst (hlist_del_init_rcu(&ifp->addr_lst)) and drops all
      references (__in6_ifa_put(ifp) + in6_ifa_put(ifp)). With bad enough
      timing, this can happen:
      
      1. In ipv6_get_ifaddr, hlist_for_each_entry_rcu returns an entry.
      
      2. Then, the whole ipv6_del_addr is executed for the given entry. The
         reference count drops to zero and kfree_rcu is scheduled.
      
      3. ipv6_get_ifaddr continues and tries to increments the reference count
         (in6_ifa_hold).
      
      4. The rcu is unlocked and the entry is freed.
      
      5. The freed entry is returned.
      
      Prevent increasing of the reference count in such case. The name
      in6_ifa_hold_safe is chosen to mimic the existing fib6_info_hold_safe.
      
      [   41.506330] refcount_t: addition on 0; use-after-free.
      [   41.506760] WARNING: CPU: 0 PID: 595 at lib/refcount.c:25 refcount_warn_saturate+0xa5/0x130
      [   41.507413] Modules linked in: veth bridge stp llc
      [   41.507821] CPU: 0 PID: 595 Comm: python3 Not tainted 6.9.0-rc2.main-00208-g49563be8 #14
      [   41.508479] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
      [   41.509163] RIP: 0010:refcount_warn_saturate+0xa5/0x130
      [   41.509586] Code: ad ff 90 0f 0b 90 90 c3 cc cc cc cc 80 3d c0 30 ad 01 00 75 a0 c6 05 b7 30 ad 01 01 90 48 c7 c7 38 cc 7a 8c e8 cc 18 ad ff 90 <0f> 0b 90 90 c3 cc cc cc cc 80 3d 98 30 ad 01 00 0f 85 75 ff ff ff
      [   41.510956] RSP: 0018:ffffbda3c026baf0 EFLAGS: 00010282
      [   41.511368] RAX: 0000000000000000 RBX: ffff9e9c46914800 RCX: 0000000000000000
      [   41.511910] RDX: ffff9e9c7ec29c00 RSI: ffff9e9c7ec1c900 RDI: ffff9e9c7ec1c900
      [   41.512445] RBP: ffff9e9c43660c9c R08: 0000000000009ffb R09: 00000000ffffdfff
      [   41.512998] R10: 00000000ffffdfff R11: ffffffff8ca58a40 R12: ffff9e9c4339a000
      [   41.513534] R13: 0000000000000001 R14: ffff9e9c438a0000 R15: ffffbda3c026bb48
      [   41.514086] FS:  00007fbc4cda1740(0000) GS:ffff9e9c7ec00000(0000) knlGS:0000000000000000
      [   41.514726] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   41.515176] CR2: 000056233b337d88 CR3: 000000000376e006 CR4: 0000000000370ef0
      [   41.515713] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   41.516252] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   41.516799] Call Trace:
      [   41.517037]  <TASK>
      [   41.517249]  ? __warn+0x7b/0x120
      [   41.517535]  ? refcount_warn_saturate+0xa5/0x130
      [   41.517923]  ? report_bug+0x164/0x190
      [   41.518240]  ? handle_bug+0x3d/0x70
      [   41.518541]  ? exc_invalid_op+0x17/0x70
      [   41.520972]  ? asm_exc_invalid_op+0x1a/0x20
      [   41.521325]  ? refcount_warn_saturate+0xa5/0x130
      [   41.521708]  ipv6_get_ifaddr+0xda/0xe0
      [   41.522035]  inet6_rtm_getaddr+0x342/0x3f0
      [   41.522376]  ? __pfx_inet6_rtm_getaddr+0x10/0x10
      [   41.522758]  rtnetlink_rcv_msg+0x334/0x3d0
      [   41.523102]  ? netlink_unicast+0x30f/0x390
      [   41.523445]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
      [   41.523832]  netlink_rcv_skb+0x53/0x100
      [   41.524157]  netlink_unicast+0x23b/0x390
      [   41.524484]  netlink_sendmsg+0x1f2/0x440
      [   41.524826]  __sys_sendto+0x1d8/0x1f0
      [   41.525145]  __x64_sys_sendto+0x1f/0x30
      [   41.525467]  do_syscall_64+0xa5/0x1b0
      [   41.525794]  entry_SYSCALL_64_after_hwframe+0x72/0x7a
      [   41.526213] RIP: 0033:0x7fbc4cfcea9a
      [   41.526528] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
      [   41.527942] RSP: 002b:00007ffcf54012a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      [   41.528593] RAX: ffffffffffffffda RBX: 00007ffcf5401368 RCX: 00007fbc4cfcea9a
      [   41.529173] RDX: 000000000000002c RSI: 00007fbc4b9d9bd0 RDI: 0000000000000005
      [   41.529786] RBP: 00007fbc4bafb040 R08: 00007ffcf54013e0 R09: 000000000000000c
      [   41.530375] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      [   41.530977] R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007fbc4ca85d1b
      [   41.531573]  </TASK>
      
      Fixes: 5c578aed ("IPv6: convert addrconf hash list to RCU")
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Link: https://lore.kernel.org/r/8ab821e36073a4a406c50ec83c9e8dc586c539e4.1712585809.git.jbenc@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7633c4da
    • Jakub Kicinski's avatar
      Merge branch 'net-start-to-replace-copy_from_sockptr' · 7b6575c6
      Jakub Kicinski authored
      Eric Dumazet says:
      
      ====================
      net: start to replace copy_from_sockptr()
      
      We got several syzbot reports about unsafe copy_from_sockptr()
      calls. After fixing some of them, it appears that we could
      use a new helper to factorize all the checks in one place.
      
      This series targets net tree, we can later start converting
      many call sites in net-next.
      ====================
      
      Link: https://lore.kernel.org/r/20240408082845.3957374-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7b6575c6
    • Eric Dumazet's avatar
      nfc: llcp: fix nfc_llcp_setsockopt() unsafe copies · 7a87441c
      Eric Dumazet authored
      syzbot reported unsafe calls to copy_from_sockptr() [1]
      
      Use copy_safe_from_sockptr() instead.
      
      [1]
      
      BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
       BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline]
       BUG: KASAN: slab-out-of-bounds in nfc_llcp_setsockopt+0x6c2/0x850 net/nfc/llcp_sock.c:255
      Read of size 4 at addr ffff88801caa1ec3 by task syz-executor459/5078
      
      CPU: 0 PID: 5078 Comm: syz-executor459 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
      Call Trace:
       <TASK>
        __dump_stack lib/dump_stack.c:88 [inline]
        dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
        print_address_description mm/kasan/report.c:377 [inline]
        print_report+0x169/0x550 mm/kasan/report.c:488
        kasan_report+0x143/0x180 mm/kasan/report.c:601
        copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
        copy_from_sockptr include/linux/sockptr.h:55 [inline]
        nfc_llcp_setsockopt+0x6c2/0x850 net/nfc/llcp_sock.c:255
        do_sock_setsockopt+0x3b1/0x720 net/socket.c:2311
        __sys_setsockopt+0x1ae/0x250 net/socket.c:2334
        __do_sys_setsockopt net/socket.c:2343 [inline]
        __se_sys_setsockopt net/socket.c:2340 [inline]
        __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
       do_syscall_64+0xfd/0x240
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      RIP: 0033:0x7f7fac07fd89
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 91 18 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fff660eb788 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f7fac07fd89
      RDX: 0000000000000000 RSI: 0000000000000118 RDI: 0000000000000004
      RBP: 0000000000000000 R08: 0000000000000002 R09: 0000000000000000
      R10: 0000000020000a80 R11: 0000000000000246 R12: 0000000000000000
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
      Link: https://lore.kernel.org/r/20240408082845.3957374-4-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7a87441c
    • Eric Dumazet's avatar
      mISDN: fix MISDN_TIME_STAMP handling · 138b7878
      Eric Dumazet authored
      syzbot reports one unsafe call to copy_from_sockptr() [1]
      
      Use copy_safe_from_sockptr() instead.
      
      [1]
      
       BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
       BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline]
       BUG: KASAN: slab-out-of-bounds in data_sock_setsockopt+0x46c/0x4cc drivers/isdn/mISDN/socket.c:417
      Read of size 4 at addr ffff0000c6d54083 by task syz-executor406/6167
      
      CPU: 1 PID: 6167 Comm: syz-executor406 Not tainted 6.8.0-rc7-syzkaller-g707081b61156 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
      Call trace:
        dump_backtrace+0x1b8/0x1e4 arch/arm64/kernel/stacktrace.c:291
        show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:298
        __dump_stack lib/dump_stack.c:88 [inline]
        dump_stack_lvl+0xd0/0x124 lib/dump_stack.c:106
        print_address_description mm/kasan/report.c:377 [inline]
        print_report+0x178/0x518 mm/kasan/report.c:488
        kasan_report+0xd8/0x138 mm/kasan/report.c:601
        __asan_report_load_n_noabort+0x1c/0x28 mm/kasan/report_generic.c:391
        copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
        copy_from_sockptr include/linux/sockptr.h:55 [inline]
        data_sock_setsockopt+0x46c/0x4cc drivers/isdn/mISDN/socket.c:417
        do_sock_setsockopt+0x2a0/0x4e0 net/socket.c:2311
        __sys_setsockopt+0x128/0x1a8 net/socket.c:2334
        __do_sys_setsockopt net/socket.c:2343 [inline]
        __se_sys_setsockopt net/socket.c:2340 [inline]
        __arm64_sys_setsockopt+0xb8/0xd4 net/socket.c:2340
        __invoke_syscall arch/arm64/kernel/syscall.c:34 [inline]
        invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:48
        el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:133
        do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:152
        el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
        el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
        el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
      
      Fixes: 1b2b03f8 ("Add mISDN core files")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Karsten Keil <isdn@linux-pingi.de>
      Link: https://lore.kernel.org/r/20240408082845.3957374-3-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      138b7878
    • Eric Dumazet's avatar
      net: add copy_safe_from_sockptr() helper · 6309863b
      Eric Dumazet authored
      copy_from_sockptr() helper is unsafe, unless callers
      did the prior check against user provided optlen.
      
      Too many callers get this wrong, lets add a helper to
      fix them and avoid future copy/paste bugs.
      
      Instead of :
      
         if (optlen < sizeof(opt)) {
             err = -EINVAL;
             break;
         }
         if (copy_from_sockptr(&opt, optval, sizeof(opt)) {
             err = -EFAULT;
             break;
         }
      
      Use :
      
         err = copy_safe_from_sockptr(&opt, sizeof(opt),
                                      optval, optlen);
         if (err)
             break;
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240408082845.3957374-2-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6309863b
  3. 09 Apr, 2024 6 commits
    • Arnd Bergmann's avatar
      ipv4/route: avoid unused-but-set-variable warning · cf1b7201
      Arnd Bergmann authored
      The log_martians variable is only used in an #ifdef, causing a 'make W=1'
      warning with gcc:
      
      net/ipv4/route.c: In function 'ip_rt_send_redirect':
      net/ipv4/route.c:880:13: error: variable 'log_martians' set but not used [-Werror=unused-but-set-variable]
      
      Change the #ifdef to an equivalent IS_ENABLED() to let the compiler
      see where the variable is used.
      
      Fixes: 30038fc6 ("net: ip_rt_send_redirect() optimization")
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240408074219.3030256-2-arnd@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      cf1b7201
    • Arnd Bergmann's avatar
      ipv6: fib: hide unused 'pn' variable · 74043489
      Arnd Bergmann authored
      When CONFIG_IPV6_SUBTREES is disabled, the only user is hidden, causing
      a 'make W=1' warning:
      
      net/ipv6/ip6_fib.c: In function 'fib6_add':
      net/ipv6/ip6_fib.c:1388:32: error: variable 'pn' set but not used [-Werror=unused-but-set-variable]
      
      Add another #ifdef around the variable declaration, matching the other
      uses in this file.
      
      Fixes: 66729e18 ("[IPV6] ROUTE: Make sure we have fn->leaf when adding a node on subtree.")
      Link: https://lore.kernel.org/netdev/20240322131746.904943-1-arnd@kernel.org/Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240408074219.3030256-1-arnd@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      74043489
    • Geetha sowjanya's avatar
      octeontx2-af: Fix NIX SQ mode and BP config · faf23006
      Geetha sowjanya authored
      NIX SQ mode and link backpressure configuration is required for
      all platforms. But in current driver this code is wrongly placed
      under specific platform check. This patch fixes the issue by
      moving the code out of platform check.
      
      Fixes: 5d9b976d ("octeontx2-af: Support fixed transmit scheduler topology")
      Signed-off-by: default avatarGeetha sowjanya <gakula@marvell.com>
      Link: https://lore.kernel.org/r/20240408063643.26288-1-gakula@marvell.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      faf23006
    • Kuniyuki Iwashima's avatar
      af_unix: Clear stale u->oob_skb. · b46f4eaa
      Kuniyuki Iwashima authored
      syzkaller started to report deadlock of unix_gc_lock after commit
      4090fa37 ("af_unix: Replace garbage collection algorithm."), but
      it just uncovers the bug that has been there since commit 314001f0
      ("af_unix: Add OOB support").
      
      The repro basically does the following.
      
        from socket import *
        from array import array
      
        c1, c2 = socketpair(AF_UNIX, SOCK_STREAM)
        c1.sendmsg([b'a'], [(SOL_SOCKET, SCM_RIGHTS, array("i", [c2.fileno()]))], MSG_OOB)
        c2.recv(1)  # blocked as no normal data in recv queue
      
        c2.close()  # done async and unblock recv()
        c1.close()  # done async and trigger GC
      
      A socket sends its file descriptor to itself as OOB data and tries to
      receive normal data, but finally recv() fails due to async close().
      
      The problem here is wrong handling of OOB skb in manage_oob().  When
      recvmsg() is called without MSG_OOB, manage_oob() is called to check
      if the peeked skb is OOB skb.  In such a case, manage_oob() pops it
      out of the receive queue but does not clear unix_sock(sk)->oob_skb.
      This is wrong in terms of uAPI.
      
      Let's say we send "hello" with MSG_OOB, and "world" without MSG_OOB.
      The 'o' is handled as OOB data.  When recv() is called twice without
      MSG_OOB, the OOB data should be lost.
      
        >>> from socket import *
        >>> c1, c2 = socketpair(AF_UNIX, SOCK_STREAM, 0)
        >>> c1.send(b'hello', MSG_OOB)  # 'o' is OOB data
        5
        >>> c1.send(b'world')
        5
        >>> c2.recv(5)  # OOB data is not received
        b'hell'
        >>> c2.recv(5)  # OOB date is skipped
        b'world'
        >>> c2.recv(5, MSG_OOB)  # This should return an error
        b'o'
      
      In the same situation, TCP actually returns -EINVAL for the last
      recv().
      
      Also, if we do not clear unix_sk(sk)->oob_skb, unix_poll() always set
      EPOLLPRI even though the data has passed through by previous recv().
      
      To avoid these issues, we must clear unix_sk(sk)->oob_skb when dequeuing
      it from recv queue.
      
      The reason why the old GC did not trigger the deadlock is because the
      old GC relied on the receive queue to detect the loop.
      
      When it is triggered, the socket with OOB data is marked as GC candidate
      because file refcount == inflight count (1).  However, after traversing
      all inflight sockets, the socket still has a positive inflight count (1),
      thus the socket is excluded from candidates.  Then, the old GC lose the
      chance to garbage-collect the socket.
      
      With the old GC, the repro continues to create true garbage that will
      never be freed nor detected by kmemleak as it's linked to the global
      inflight list.  That's why we couldn't even notice the issue.
      
      Fixes: 314001f0 ("af_unix: Add OOB support")
      Reported-by: syzbot+7f7f201cc2668a8fd169@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=7f7f201cc2668a8fd169Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240405221057.2406-1-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b46f4eaa
    • Marek Vasut's avatar
      net: ks8851: Handle softirqs at the end of IRQ thread to fix hang · be0384bf
      Marek Vasut authored
      The ks8851_irq() thread may call ks8851_rx_pkts() in case there are
      any packets in the MAC FIFO, which calls netif_rx(). This netif_rx()
      implementation is guarded by local_bh_disable() and local_bh_enable().
      The local_bh_enable() may call do_softirq() to run softirqs in case
      any are pending. One of the softirqs is net_rx_action, which ultimately
      reaches the driver .start_xmit callback. If that happens, the system
      hangs. The entire call chain is below:
      
      ks8851_start_xmit_par from netdev_start_xmit
      netdev_start_xmit from dev_hard_start_xmit
      dev_hard_start_xmit from sch_direct_xmit
      sch_direct_xmit from __dev_queue_xmit
      __dev_queue_xmit from __neigh_update
      __neigh_update from neigh_update
      neigh_update from arp_process.constprop.0
      arp_process.constprop.0 from __netif_receive_skb_one_core
      __netif_receive_skb_one_core from process_backlog
      process_backlog from __napi_poll.constprop.0
      __napi_poll.constprop.0 from net_rx_action
      net_rx_action from __do_softirq
      __do_softirq from call_with_stack
      call_with_stack from do_softirq
      do_softirq from __local_bh_enable_ip
      __local_bh_enable_ip from netif_rx
      netif_rx from ks8851_irq
      ks8851_irq from irq_thread_fn
      irq_thread_fn from irq_thread
      irq_thread from kthread
      kthread from ret_from_fork
      
      The hang happens because ks8851_irq() first locks a spinlock in
      ks8851_par.c ks8851_lock_par() spin_lock_irqsave(&ksp->lock, ...)
      and with that spinlock locked, calls netif_rx(). Once the execution
      reaches ks8851_start_xmit_par(), it calls ks8851_lock_par() again
      which attempts to claim the already locked spinlock again, and the
      hang happens.
      
      Move the do_softirq() call outside of the spinlock protected section
      of ks8851_irq() by disabling BHs around the entire spinlock protected
      section of ks8851_irq() handler. Place local_bh_enable() outside of
      the spinlock protected section, so that it can trigger do_softirq()
      without the ks8851_par.c ks8851_lock_par() spinlock being held, and
      safely call ks8851_start_xmit_par() without attempting to lock the
      already locked spinlock.
      
      Since ks8851_irq() is protected by local_bh_disable()/local_bh_enable()
      now, replace netif_rx() with __netif_rx() which is not duplicating the
      local_bh_disable()/local_bh_enable() calls.
      
      Fixes: 797047f8 ("net: ks8851: Implement Parallel bus operations")
      Signed-off-by: default avatarMarek Vasut <marex@denx.de>
      Link: https://lore.kernel.org/r/20240405203204.82062-2-marex@denx.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      be0384bf
    • Marek Vasut's avatar
      net: ks8851: Inline ks8851_rx_skb() · f96f7004
      Marek Vasut authored
      Both ks8851_rx_skb_par() and ks8851_rx_skb_spi() call netif_rx(skb),
      inline the netif_rx(skb) call directly into ks8851_common.c and drop
      the .rx_skb callback and ks8851_rx_skb() wrapper. This removes one
      indirect call from the driver, no functional change otherwise.
      Signed-off-by: default avatarMarek Vasut <marex@denx.de>
      Link: https://lore.kernel.org/r/20240405203204.82062-1-marex@denx.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f96f7004
  4. 08 Apr, 2024 12 commits
  5. 07 Apr, 2024 2 commits
    • Hariprasad Kelam's avatar
      octeontx2-pf: Fix transmit scheduler resource leak · bccb798e
      Hariprasad Kelam authored
      Inorder to support shaping and scheduling, Upon class creation
      Netdev driver allocates trasmit schedulers.
      
      The previous patch which added support for Round robin scheduling has
      a bug due to which driver is not freeing transmit schedulers post
      class deletion.
      
      This patch fixes the same.
      
      Fixes: 47a9656f ("octeontx2-pf: htb offload support for Round Robin scheduling")
      Signed-off-by: default avatarHariprasad Kelam <hkelam@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bccb798e
    • Breno Leitao's avatar
      virtio_net: Do not send RSS key if it is not supported · 059a49aa
      Breno Leitao authored
      There is a bug when setting the RSS options in virtio_net that can break
      the whole machine, getting the kernel into an infinite loop.
      
      Running the following command in any QEMU virtual machine with virtionet
      will reproduce this problem:
      
          # ethtool -X eth0  hfunc toeplitz
      
      This is how the problem happens:
      
      1) ethtool_set_rxfh() calls virtnet_set_rxfh()
      
      2) virtnet_set_rxfh() calls virtnet_commit_rss_command()
      
      3) virtnet_commit_rss_command() populates 4 entries for the rss
      scatter-gather
      
      4) Since the command above does not have a key, then the last
      scatter-gatter entry will be zeroed, since rss_key_size == 0.
      sg_buf_size = vi->rss_key_size;
      
      5) This buffer is passed to qemu, but qemu is not happy with a buffer
      with zero length, and do the following in virtqueue_map_desc() (QEMU
      function):
      
        if (!sz) {
            virtio_error(vdev, "virtio: zero sized buffers are not allowed");
      
      6) virtio_error() (also QEMU function) set the device as broken
      
          vdev->broken = true;
      
      7) Qemu bails out, and do not repond this crazy kernel.
      
      8) The kernel is waiting for the response to come back (function
      virtnet_send_command())
      
      9) The kernel is waiting doing the following :
      
            while (!virtqueue_get_buf(vi->cvq, &tmp) &&
      	     !virtqueue_is_broken(vi->cvq))
      	      cpu_relax();
      
      10) None of the following functions above is true, thus, the kernel
      loops here forever. Keeping in mind that virtqueue_is_broken() does
      not look at the qemu `vdev->broken`, so, it never realizes that the
      vitio is broken at QEMU side.
      
      Fix it by not sending RSS commands if the feature is not available in
      the device.
      
      Fixes: c7114b12 ("drivers/net/virtio_net: Added basic RSS support.")
      Cc: stable@vger.kernel.org
      Cc: qemu-devel@nongnu.org
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Reviewed-by: default avatarHeng Qi <hengqi@linux.alibaba.com>
      Reviewed-by: default avatarXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      059a49aa
  6. 06 Apr, 2024 3 commits
    • Eric Dumazet's avatar
      xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING · 237f3cf1
      Eric Dumazet authored
      syzbot reported an illegal copy in xsk_setsockopt() [1]
      
      Make sure to validate setsockopt() @optlen parameter.
      
      [1]
      
       BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
       BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline]
       BUG: KASAN: slab-out-of-bounds in xsk_setsockopt+0x909/0xa40 net/xdp/xsk.c:1420
      Read of size 4 at addr ffff888028c6cde3 by task syz-executor.0/7549
      
      CPU: 0 PID: 7549 Comm: syz-executor.0 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
      Call Trace:
       <TASK>
        __dump_stack lib/dump_stack.c:88 [inline]
        dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
        print_address_description mm/kasan/report.c:377 [inline]
        print_report+0x169/0x550 mm/kasan/report.c:488
        kasan_report+0x143/0x180 mm/kasan/report.c:601
        copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
        copy_from_sockptr include/linux/sockptr.h:55 [inline]
        xsk_setsockopt+0x909/0xa40 net/xdp/xsk.c:1420
        do_sock_setsockopt+0x3af/0x720 net/socket.c:2311
        __sys_setsockopt+0x1ae/0x250 net/socket.c:2334
        __do_sys_setsockopt net/socket.c:2343 [inline]
        __se_sys_setsockopt net/socket.c:2340 [inline]
        __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
       do_syscall_64+0xfb/0x240
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      RIP: 0033:0x7fb40587de69
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fb40665a0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
      RAX: ffffffffffffffda RBX: 00007fb4059abf80 RCX: 00007fb40587de69
      RDX: 0000000000000005 RSI: 000000000000011b RDI: 0000000000000006
      RBP: 00007fb4058ca47a R08: 0000000000000002 R09: 0000000000000000
      R10: 0000000020001980 R11: 0000000000000246 R12: 0000000000000000
      R13: 000000000000000b R14: 00007fb4059abf80 R15: 00007fff57ee4d08
       </TASK>
      
      Allocated by task 7549:
        kasan_save_stack mm/kasan/common.c:47 [inline]
        kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
        poison_kmalloc_redzone mm/kasan/common.c:370 [inline]
        __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:387
        kasan_kmalloc include/linux/kasan.h:211 [inline]
        __do_kmalloc_node mm/slub.c:3966 [inline]
        __kmalloc+0x233/0x4a0 mm/slub.c:3979
        kmalloc include/linux/slab.h:632 [inline]
        __cgroup_bpf_run_filter_setsockopt+0xd2f/0x1040 kernel/bpf/cgroup.c:1869
        do_sock_setsockopt+0x6b4/0x720 net/socket.c:2293
        __sys_setsockopt+0x1ae/0x250 net/socket.c:2334
        __do_sys_setsockopt net/socket.c:2343 [inline]
        __se_sys_setsockopt net/socket.c:2340 [inline]
        __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
       do_syscall_64+0xfb/0x240
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      
      The buggy address belongs to the object at ffff888028c6cde0
       which belongs to the cache kmalloc-8 of size 8
      The buggy address is located 1 bytes to the right of
       allocated 2-byte region [ffff888028c6cde0, ffff888028c6cde2)
      
      The buggy address belongs to the physical page:
      page:ffffea0000a31b00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888028c6c9c0 pfn:0x28c6c
      anon flags: 0xfff00000000800(slab|node=0|zone=1|lastcpupid=0x7ff)
      page_type: 0xffffffff()
      raw: 00fff00000000800 ffff888014c41280 0000000000000000 dead000000000001
      raw: ffff888028c6c9c0 0000000080800057 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
      page last allocated via order 0, migratetype Unmovable, gfp_mask 0x112cc0(GFP_USER|__GFP_NOWARN|__GFP_NORETRY), pid 6648, tgid 6644 (syz-executor.0), ts 133906047828, free_ts 133859922223
        set_page_owner include/linux/page_owner.h:31 [inline]
        post_alloc_hook+0x1ea/0x210 mm/page_alloc.c:1533
        prep_new_page mm/page_alloc.c:1540 [inline]
        get_page_from_freelist+0x33ea/0x3580 mm/page_alloc.c:3311
        __alloc_pages+0x256/0x680 mm/page_alloc.c:4569
        __alloc_pages_node include/linux/gfp.h:238 [inline]
        alloc_pages_node include/linux/gfp.h:261 [inline]
        alloc_slab_page+0x5f/0x160 mm/slub.c:2175
        allocate_slab mm/slub.c:2338 [inline]
        new_slab+0x84/0x2f0 mm/slub.c:2391
        ___slab_alloc+0xc73/0x1260 mm/slub.c:3525
        __slab_alloc mm/slub.c:3610 [inline]
        __slab_alloc_node mm/slub.c:3663 [inline]
        slab_alloc_node mm/slub.c:3835 [inline]
        __do_kmalloc_node mm/slub.c:3965 [inline]
        __kmalloc_node+0x2db/0x4e0 mm/slub.c:3973
        kmalloc_node include/linux/slab.h:648 [inline]
        __vmalloc_area_node mm/vmalloc.c:3197 [inline]
        __vmalloc_node_range+0x5f9/0x14a0 mm/vmalloc.c:3392
        __vmalloc_node mm/vmalloc.c:3457 [inline]
        vzalloc+0x79/0x90 mm/vmalloc.c:3530
        bpf_check+0x260/0x19010 kernel/bpf/verifier.c:21162
        bpf_prog_load+0x1667/0x20f0 kernel/bpf/syscall.c:2895
        __sys_bpf+0x4ee/0x810 kernel/bpf/syscall.c:5631
        __do_sys_bpf kernel/bpf/syscall.c:5738 [inline]
        __se_sys_bpf kernel/bpf/syscall.c:5736 [inline]
        __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5736
       do_syscall_64+0xfb/0x240
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      page last free pid 6650 tgid 6647 stack trace:
        reset_page_owner include/linux/page_owner.h:24 [inline]
        free_pages_prepare mm/page_alloc.c:1140 [inline]
        free_unref_page_prepare+0x95d/0xa80 mm/page_alloc.c:2346
        free_unref_page_list+0x5a3/0x850 mm/page_alloc.c:2532
        release_pages+0x2117/0x2400 mm/swap.c:1042
        tlb_batch_pages_flush mm/mmu_gather.c:98 [inline]
        tlb_flush_mmu_free mm/mmu_gather.c:293 [inline]
        tlb_flush_mmu+0x34d/0x4e0 mm/mmu_gather.c:300
        tlb_finish_mmu+0xd4/0x200 mm/mmu_gather.c:392
        exit_mmap+0x4b6/0xd40 mm/mmap.c:3300
        __mmput+0x115/0x3c0 kernel/fork.c:1345
        exit_mm+0x220/0x310 kernel/exit.c:569
        do_exit+0x99e/0x27e0 kernel/exit.c:865
        do_group_exit+0x207/0x2c0 kernel/exit.c:1027
        get_signal+0x176e/0x1850 kernel/signal.c:2907
        arch_do_signal_or_restart+0x96/0x860 arch/x86/kernel/signal.c:310
        exit_to_user_mode_loop kernel/entry/common.c:105 [inline]
        exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
        __syscall_exit_to_user_mode_work kernel/entry/common.c:201 [inline]
        syscall_exit_to_user_mode+0xc9/0x360 kernel/entry/common.c:212
        do_syscall_64+0x10a/0x240 arch/x86/entry/common.c:89
       entry_SYSCALL_64_after_hwframe+0x6d/0x75
      
      Memory state around the buggy address:
       ffff888028c6cc80: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
       ffff888028c6cd00: fa fc fc fc fa fc fc fc 00 fc fc fc 06 fc fc fc
      >ffff888028c6cd80: fa fc fc fc fa fc fc fc fa fc fc fc 02 fc fc fc
                                                             ^
       ffff888028c6ce00: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
       ffff888028c6ce80: fa fc fc fc fa fc fc fc fa fc fc fc fa fc fc fc
      
      Fixes: 423f3832 ("xsk: add umem fill queue support and mmap")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: "Björn Töpel" <bjorn@kernel.org>
      Cc: Magnus Karlsson <magnus.karlsson@intel.com>
      Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
      Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/r/20240404202738.3634547-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      237f3cf1
    • Petr Tesarik's avatar
      u64_stats: fix u64_stats_init() for lockdep when used repeatedly in one file · 38a15d0a
      Petr Tesarik authored
      Fix bogus lockdep warnings if multiple u64_stats_sync variables are
      initialized in the same file.
      
      With CONFIG_LOCKDEP, seqcount_init() is a macro which declares:
      
      	static struct lock_class_key __key;
      
      Since u64_stats_init() is a function (albeit an inline one), all calls
      within the same file end up using the same instance, effectively treating
      them all as a single lock-class.
      
      Fixes: 9464ca65 ("net: make u64_stats_init() a function")
      Closes: https://lore.kernel.org/netdev/ea1567d9-ce66-45e6-8168-ac40a47d1821@roeck-us.net/Signed-off-by: default avatarPetr Tesarik <petr@tesarici.cz>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240404075740.30682-1-petr@tesarici.czSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      38a15d0a
    • Ilya Maximets's avatar
      net: openvswitch: fix unwanted error log on timeout policy probing · 4539f91f
      Ilya Maximets authored
      On startup, ovs-vswitchd probes different datapath features including
      support for timeout policies.  While probing, it tries to execute
      certain operations with OVS_PACKET_ATTR_PROBE or OVS_FLOW_ATTR_PROBE
      attributes set.  These attributes tell the openvswitch module to not
      log any errors when they occur as it is expected that some of the
      probes will fail.
      
      For some reason, setting the timeout policy ignores the PROBE attribute
      and logs a failure anyway.  This is causing the following kernel log
      on each re-start of ovs-vswitchd:
      
        kernel: Failed to associated timeout policy `ovs_test_tp'
      
      Fix that by using the same logging macro that all other messages are
      using.  The message will still be printed at info level when needed
      and will be rate limited, but with a net rate limiter instead of
      generic printk one.
      
      The nf_ct_set_timeout() itself will still print some info messages,
      but at least this change makes logging in openvswitch module more
      consistent.
      
      Fixes: 06bd2bdf ("openvswitch: Add timeout support to ct action")
      Signed-off-by: default avatarIlya Maximets <i.maximets@ovn.org>
      Acked-by: default avatarEelco Chaudron <echaudro@redhat.com>
      Link: https://lore.kernel.org/r/20240403203803.2137962-1-i.maximets@ovn.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4539f91f
  7. 04 Apr, 2024 3 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · c88b9b4c
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from netfilter, bluetooth and bpf.
      
        Fairly usual collection of driver and core fixes. The large selftest
        accompanying one of the fixes is also becoming a common occurrence.
      
        Current release - regressions:
      
         - ipv6: fix infinite recursion in fib6_dump_done()
      
         - net/rds: fix possible null-deref in newly added error path
      
        Current release - new code bugs:
      
         - net: do not consume a full cacheline for system_page_pool
      
         - bpf: fix bpf_arena-related file descriptor leaks in the verifier
      
         - drv: ice: fix freeing uninitialized pointers, fixing misuse of the
           newfangled __free() auto-cleanup
      
        Previous releases - regressions:
      
         - x86/bpf: fixes the BPF JIT with retbleed=stuff
      
         - xen-netfront: add missing skb_mark_for_recycle, fix page pool
           accounting leaks, revealed by recently added explicit warning
      
         - tcp: fix bind() regression for v6-only wildcard and v4-mapped-v6
           non-wildcard addresses
      
         - Bluetooth:
            - replace "hci_qca: Set BDA quirk bit if fwnode exists in DT" with
              better workarounds to un-break some buggy Qualcomm devices
            - set conn encrypted before conn establishes, fix re-connecting to
              some headsets which use slightly unusual sequence of msgs
      
         - mptcp:
            - prevent BPF accessing lowat from a subflow socket
            - don't account accept() of non-MPC client as fallback to TCP
      
         - drv: mana: fix Rx DMA datasize and skb_over_panic
      
         - drv: i40e: fix VF MAC filter removal
      
        Previous releases - always broken:
      
         - gro: various fixes related to UDP tunnels - netns crossing
           problems, incorrect checksum conversions, and incorrect packet
           transformations which may lead to panics
      
         - bpf: support deferring bpf_link dealloc to after RCU grace period
      
         - nf_tables:
            - release batch on table validation from abort path
            - release mutex after nft_gc_seq_end from abort path
            - flush pending destroy work before exit_net release
      
         - drv: r8169: skip DASH fw status checks when DASH is disabled"
      
      * tag 'net-6.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits)
        netfilter: validate user input for expected length
        net/sched: act_skbmod: prevent kernel-infoleak
        net: usb: ax88179_178a: avoid the interface always configured as random address
        net: dsa: sja1105: Fix parameters order in sja1110_pcs_mdio_write_c45()
        net: ravb: Always update error counters
        net: ravb: Always process TX descriptor ring
        netfilter: nf_tables: discard table flag update with pending basechain deletion
        netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get()
        netfilter: nf_tables: reject new basechain after table flag update
        netfilter: nf_tables: flush pending destroy work before exit_net release
        netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path
        netfilter: nf_tables: release batch on table validation from abort path
        Revert "tg3: Remove residual error handling in tg3_suspend"
        tg3: Remove residual error handling in tg3_suspend
        net: mana: Fix Rx DMA datasize and skb_over_panic
        net/sched: fix lockdep splat in qdisc_tree_reduce_backlog()
        net: phy: micrel: lan8814: Fix when enabling/disabling 1-step timestamping
        net: stmmac: fix rx queue priority assignment
        net: txgbe: fix i2c dev name cannot match clkdev
        net: fec: Set mac_managed_pm during probe
        ...
      c88b9b4c
    • Linus Torvalds's avatar
      Merge tag 'bcachefs-2024-04-03' of https://evilpiepirate.org/git/bcachefs · ec25bd8d
      Linus Torvalds authored
      Pull bcachefs repair code from Kent Overstreet:
       "A couple more small fixes, and new repair code.
      
        We can now automatically recover from arbitrary corrupted interior
        btree nodes by scanning, and we can reconstruct metadata as needed to
        bring a filesystem back into a working, consistent, read-write state
        and preserve access to whatevver wasn't corrupted.
      
        Meaning - you can blow away all metadata except for extents and
        dirents leaf nodes, and repair will reconstruct everything else and
        give you your data, and under the correct paths. If inodes are missing
        i_size will be slightly off and permissions/ownership/timestamps will
        be gone, and we do still need the snapshots btree if snapshots were in
        use - in the future we'll be able to guess the snapshot tree structure
        in some situations.
      
        IOW - aside from shaking out remaining bugs (fuzz testing is still
        coming), repair code should be complete and if repair ever doesn't
        work that's the highest priority bug that I want to know about
        immediately.
      
        This patchset was kindly tested by a user from India who accidentally
        wiped one drive out of a three drive filesystem with no replication on
        the family computer - it took a couple weeks but we got everything
        important back"
      
      * tag 'bcachefs-2024-04-03' of https://evilpiepirate.org/git/bcachefs:
        bcachefs: reconstruct_inode()
        bcachefs: Subvolume reconstruction
        bcachefs: Check for extents that point to same space
        bcachefs: Reconstruct missing snapshot nodes
        bcachefs: Flag btrees with missing data
        bcachefs: Topology repair now uses nodes found by scanning to fill holes
        bcachefs: Repair pass for scanning for btree nodes
        bcachefs: Don't skip fake btree roots in fsck
        bcachefs: bch2_btree_root_alloc() -> bch2_btree_root_alloc_fake()
        bcachefs: Etyzinger cleanups
        bcachefs: bch2_shoot_down_journal_keys()
        bcachefs: Clear recovery_passes_required as they complete without errors
        bcachefs: ratelimit informational fsck errors
        bcachefs: Check for bad needs_discard before doing discard
        bcachefs: Improve bch2_btree_update_to_text()
        mean_and_variance: Drop always failing tests
        bcachefs: fix nocow lock deadlock
        bcachefs: BCH_WATERMARK_interior_updates
        bcachefs: Fix btree node reserve
      ec25bd8d
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 1cfa2f10
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2024-04-04
      
      We've added 7 non-merge commits during the last 5 day(s) which contain
      a total of 9 files changed, 75 insertions(+), 24 deletions(-).
      
      The main changes are:
      
      1) Fix x86 BPF JIT under retbleed=stuff which causes kernel panics due to
         incorrect destination IP calculation and incorrect IP for relocations,
         from Uros Bizjak and Joan Bruguera Micó.
      
      2) Fix BPF arena file descriptor leaks in the verifier,
         from Anton Protopopov.
      
      3) Defer bpf_link deallocation to after RCU grace period as currently
         running multi-{kprobes,uprobes} programs might still access cookie
         information from the link, from Andrii Nakryiko.
      
      4) Fix a BPF sockmap lock inversion deadlock in map_delete_elem reported
         by syzkaller, from Jakub Sitnicki.
      
      5) Fix resolve_btfids build with musl libc due to missing linux/types.h
         include, from Natanael Copa.
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        bpf, sockmap: Prevent lock inversion deadlock in map delete elem
        x86/bpf: Fix IP for relocating call depth accounting
        x86/bpf: Fix IP after emitting call depth accounting
        bpf: fix possible file descriptor leaks in verifier
        tools/resolve_btfids: fix build with musl libc
        bpf: support deferring bpf_link dealloc to after RCU grace period
        bpf: put uprobe link's path and task in release callback
      ====================
      
      Link: https://lore.kernel.org/r/20240404183258.4401-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1cfa2f10