1. 23 May, 2023 14 commits
    • Shay Drory's avatar
      net/mlx5: E-switch, Devcom, sync devcom events and devcom comp register · 8c253dfc
      Shay Drory authored
      devcom events are sent to all registered component. Following the
      cited patch, it is possible for two components, e.g.: two eswitches,
      to send devcom events, while both components are registered. This
      means eswitch layer will do double un/pairing, which is double
      allocation and free of resources, even though only one un/pairing is
      needed. flow example:
      
      	cpu0					cpu1
      	----					----
      
       mlx5_devlink_eswitch_mode_set(dev0)
        esw_offloads_devcom_init()
         mlx5_devcom_register_component(esw0)
                                               mlx5_devlink_eswitch_mode_set(dev1)
                                                esw_offloads_devcom_init()
                                                 mlx5_devcom_register_component(esw1)
                                                 mlx5_devcom_send_event()
         mlx5_devcom_send_event()
      
      Hence, check whether the eswitches are already un/paired before
      free/allocation of resources.
      
      Fixes: 09b27846 ("net: devlink: enable parallel ops on netlink interface")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarMark Bloch <mbloch@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      8c253dfc
    • Paul Blakey's avatar
      net/mlx5e: TC, Fix using eswitch mapping in nic mode · dfa1e46d
      Paul Blakey authored
      Cited patch is using the eswitch object mapping pool while
      in nic mode where it isn't initialized. This results in the
      trace below [0].
      
      Fix that by using either nic or eswitch object mapping pool
      depending if eswitch is enabled or not.
      
      [0]:
      [  826.446057] ==================================================================
      [  826.446729] BUG: KASAN: slab-use-after-free in mlx5_add_flow_rules+0x30/0x490 [mlx5_core]
      [  826.447515] Read of size 8 at addr ffff888194485830 by task tc/6233
      
      [  826.448243] CPU: 16 PID: 6233 Comm: tc Tainted: G        W          6.3.0-rc6+ #1
      [  826.448890] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      [  826.449785] Call Trace:
      [  826.450052]  <TASK>
      [  826.450302]  dump_stack_lvl+0x33/0x50
      [  826.450650]  print_report+0xc2/0x610
      [  826.450998]  ? __virt_addr_valid+0xb1/0x130
      [  826.451385]  ? mlx5_add_flow_rules+0x30/0x490 [mlx5_core]
      [  826.451935]  kasan_report+0xae/0xe0
      [  826.452276]  ? mlx5_add_flow_rules+0x30/0x490 [mlx5_core]
      [  826.452829]  mlx5_add_flow_rules+0x30/0x490 [mlx5_core]
      [  826.453368]  ? __kmalloc_node+0x5a/0x120
      [  826.453733]  esw_add_restore_rule+0x20f/0x270 [mlx5_core]
      [  826.454288]  ? mlx5_eswitch_add_send_to_vport_meta_rule+0x260/0x260 [mlx5_core]
      [  826.455011]  ? mutex_unlock+0x80/0xd0
      [  826.455361]  ? __mutex_unlock_slowpath.constprop.0+0x210/0x210
      [  826.455862]  ? mapping_add+0x2cb/0x440 [mlx5_core]
      [  826.456425]  mlx5e_tc_action_miss_mapping_get+0x139/0x180 [mlx5_core]
      [  826.457058]  ? mlx5e_tc_update_skb_nic+0xb0/0xb0 [mlx5_core]
      [  826.457636]  ? __kasan_kmalloc+0x77/0x90
      [  826.458000]  ? __kmalloc+0x57/0x120
      [  826.458336]  mlx5_tc_ct_flow_offload+0x325/0xe40 [mlx5_core]
      [  826.458916]  ? ct_kernel_enter.constprop.0+0x48/0xa0
      [  826.459360]  ? mlx5_tc_ct_parse_action+0xf0/0xf0 [mlx5_core]
      [  826.459933]  ? mlx5e_mod_hdr_attach+0x491/0x520 [mlx5_core]
      [  826.460507]  ? mlx5e_mod_hdr_get+0x12/0x20 [mlx5_core]
      [  826.461046]  ? mlx5e_tc_attach_mod_hdr+0x154/0x170 [mlx5_core]
      [  826.461635]  mlx5e_configure_flower+0x969/0x2110 [mlx5_core]
      [  826.462217]  ? _raw_spin_lock_bh+0x85/0xe0
      [  826.462597]  ? __mlx5e_add_fdb_flow+0x750/0x750 [mlx5_core]
      [  826.463163]  ? kasan_save_stack+0x2e/0x40
      [  826.463534]  ? down_read+0x115/0x1b0
      [  826.463878]  ? down_write_killable+0x110/0x110
      [  826.464288]  ? tc_setup_action.part.0+0x9f/0x3b0
      [  826.464701]  ? mlx5e_is_uplink_rep+0x4c/0x90 [mlx5_core]
      [  826.465253]  ? mlx5e_tc_reoffload_flows_work+0x130/0x130 [mlx5_core]
      [  826.465878]  tc_setup_cb_add+0x112/0x250
      [  826.466247]  fl_hw_replace_filter+0x230/0x310 [cls_flower]
      [  826.466724]  ? fl_hw_destroy_filter+0x1a0/0x1a0 [cls_flower]
      [  826.467212]  fl_change+0x14e1/0x2030 [cls_flower]
      [  826.467636]  ? sock_def_readable+0x89/0x120
      [  826.468019]  ? fl_tmplt_create+0x2d0/0x2d0 [cls_flower]
      [  826.468509]  ? kasan_unpoison+0x23/0x50
      [  826.468873]  ? get_random_u16+0x180/0x180
      [  826.469244]  ? __radix_tree_lookup+0x2b/0x130
      [  826.469640]  ? fl_get+0x7b/0x140 [cls_flower]
      [  826.470042]  ? fl_mask_put+0x200/0x200 [cls_flower]
      [  826.470478]  ? __mutex_unlock_slowpath.constprop.0+0x210/0x210
      [  826.470973]  ? fl_tmplt_create+0x2d0/0x2d0 [cls_flower]
      [  826.471427]  tc_new_tfilter+0x644/0x1050
      [  826.471795]  ? tc_get_tfilter+0x860/0x860
      [  826.472170]  ? __thaw_task+0x130/0x130
      [  826.472525]  ? arch_stack_walk+0x98/0xf0
      [  826.472892]  ? cap_capable+0x9f/0xd0
      [  826.473235]  ? security_capable+0x47/0x60
      [  826.473608]  rtnetlink_rcv_msg+0x1d5/0x550
      [  826.473985]  ? rtnl_calcit.isra.0+0x1f0/0x1f0
      [  826.474383]  ? __stack_depot_save+0x35/0x4c0
      [  826.474779]  ? kasan_save_stack+0x2e/0x40
      [  826.475149]  ? kasan_save_stack+0x1e/0x40
      [  826.475518]  ? __kasan_record_aux_stack+0x9f/0xb0
      [  826.475939]  ? task_work_add+0x77/0x1c0
      [  826.476305]  netlink_rcv_skb+0xe0/0x210
      [  826.476661]  ? rtnl_calcit.isra.0+0x1f0/0x1f0
      [  826.477057]  ? netlink_ack+0x7c0/0x7c0
      [  826.477412]  ? rhashtable_jhash2+0xef/0x150
      [  826.477796]  ? _copy_from_iter+0x105/0x770
      [  826.484386]  netlink_unicast+0x346/0x490
      [  826.484755]  ? netlink_attachskb+0x400/0x400
      [  826.485145]  ? kernel_text_address+0xc2/0xd0
      [  826.485535]  netlink_sendmsg+0x3b0/0x6c0
      [  826.485902]  ? kernel_text_address+0xc2/0xd0
      [  826.486296]  ? netlink_unicast+0x490/0x490
      [  826.486671]  ? iovec_from_user.part.0+0x7a/0x1a0
      [  826.487083]  ? netlink_unicast+0x490/0x490
      [  826.487461]  sock_sendmsg+0x73/0xc0
      [  826.487803]  ____sys_sendmsg+0x364/0x380
      [  826.488186]  ? import_iovec+0x7/0x10
      [  826.488531]  ? kernel_sendmsg+0x30/0x30
      [  826.488893]  ? __copy_msghdr+0x180/0x180
      [  826.489258]  ? kasan_save_stack+0x2e/0x40
      [  826.489629]  ? kasan_save_stack+0x1e/0x40
      [  826.490002]  ? __kasan_record_aux_stack+0x9f/0xb0
      [  826.490424]  ? __call_rcu_common.constprop.0+0x46/0x580
      [  826.490876]  ___sys_sendmsg+0xdf/0x140
      [  826.491231]  ? copy_msghdr_from_user+0x110/0x110
      [  826.491649]  ? fget_raw+0x120/0x120
      [  826.491988]  ? ___sys_recvmsg+0xd9/0x130
      [  826.492355]  ? folio_batch_add_and_move+0x80/0xa0
      [  826.492776]  ? _raw_spin_lock+0x7a/0xd0
      [  826.493137]  ? _raw_spin_lock+0x7a/0xd0
      [  826.493500]  ? _raw_read_lock_irq+0x30/0x30
      [  826.493880]  ? kasan_set_track+0x21/0x30
      [  826.494249]  ? kasan_save_free_info+0x2a/0x40
      [  826.494650]  ? do_sys_openat2+0xff/0x270
      [  826.495016]  ? __fget_light+0x1b5/0x200
      [  826.495377]  ? __virt_addr_valid+0xb1/0x130
      [  826.495763]  __sys_sendmsg+0xb2/0x130
      [  826.496118]  ? __sys_sendmsg_sock+0x20/0x20
      [  826.496501]  ? __x64_sys_rseq+0x2e0/0x2e0
      [  826.496874]  ? do_user_addr_fault+0x276/0x820
      [  826.497273]  ? fpregs_assert_state_consistent+0x52/0x60
      [  826.497727]  ? exit_to_user_mode_prepare+0x30/0x120
      [  826.498158]  do_syscall_64+0x3d/0x90
      [  826.498502]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
      [  826.498949] RIP: 0033:0x7f9b67f4f887
      [  826.499294] Code: 0a 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
      [  826.500742] RSP: 002b:00007fff5d1a5498 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [  826.501395] RAX: ffffffffffffffda RBX: 0000000064413ce6 RCX: 00007f9b67f4f887
      [  826.501975] RDX: 0000000000000000 RSI: 00007fff5d1a5500 RDI: 0000000000000003
      [  826.502556] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001
      [  826.503135] R10: 00007f9b67e08708 R11: 0000000000000246 R12: 0000000000000001
      [  826.503714] R13: 0000000000000001 R14: 00007fff5d1a9800 R15: 0000000000485400
      [  826.504304]  </TASK>
      
      [  826.504753] Allocated by task 3764:
      [  826.505090]  kasan_save_stack+0x1e/0x40
      [  826.505453]  kasan_set_track+0x21/0x30
      [  826.505810]  __kasan_kmalloc+0x77/0x90
      [  826.506164]  __mlx5_create_flow_table+0x16d/0xbb0 [mlx5_core]
      [  826.506742]  esw_offloads_enable+0x60d/0xfb0 [mlx5_core]
      [  826.507292]  mlx5_eswitch_enable_locked+0x4d3/0x680 [mlx5_core]
      [  826.507885]  mlx5_devlink_eswitch_mode_set+0x2a3/0x580 [mlx5_core]
      [  826.508513]  devlink_nl_cmd_eswitch_set_doit+0xdf/0x1f0
      [  826.508969]  genl_family_rcv_msg_doit.isra.0+0x146/0x1c0
      [  826.509427]  genl_rcv_msg+0x28d/0x3e0
      [  826.509772]  netlink_rcv_skb+0xe0/0x210
      [  826.510133]  genl_rcv+0x24/0x40
      [  826.510448]  netlink_unicast+0x346/0x490
      [  826.510810]  netlink_sendmsg+0x3b0/0x6c0
      [  826.511179]  sock_sendmsg+0x73/0xc0
      [  826.511519]  __sys_sendto+0x18d/0x220
      [  826.511867]  __x64_sys_sendto+0x72/0x80
      [  826.512232]  do_syscall_64+0x3d/0x90
      [  826.512576]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      [  826.513220] Freed by task 5674:
      [  826.513535]  kasan_save_stack+0x1e/0x40
      [  826.513893]  kasan_set_track+0x21/0x30
      [  826.514245]  kasan_save_free_info+0x2a/0x40
      [  826.514629]  ____kasan_slab_free+0x11a/0x1b0
      [  826.515021]  __kmem_cache_free+0x14d/0x280
      [  826.515399]  tree_put_node+0x109/0x1c0 [mlx5_core]
      [  826.515907]  mlx5_destroy_flow_table+0x119/0x630 [mlx5_core]
      [  826.516481]  esw_offloads_steering_cleanup+0xe7/0x150 [mlx5_core]
      [  826.517084]  esw_offloads_disable+0xe0/0x160 [mlx5_core]
      [  826.517632]  mlx5_eswitch_disable_locked+0x26c/0x290 [mlx5_core]
      [  826.518225]  mlx5_devlink_eswitch_mode_set+0x128/0x580 [mlx5_core]
      [  826.518834]  devlink_nl_cmd_eswitch_set_doit+0xdf/0x1f0
      [  826.519286]  genl_family_rcv_msg_doit.isra.0+0x146/0x1c0
      [  826.519748]  genl_rcv_msg+0x28d/0x3e0
      [  826.520101]  netlink_rcv_skb+0xe0/0x210
      [  826.520458]  genl_rcv+0x24/0x40
      [  826.520771]  netlink_unicast+0x346/0x490
      [  826.521137]  netlink_sendmsg+0x3b0/0x6c0
      [  826.521505]  sock_sendmsg+0x73/0xc0
      [  826.521842]  __sys_sendto+0x18d/0x220
      [  826.522191]  __x64_sys_sendto+0x72/0x80
      [  826.522554]  do_syscall_64+0x3d/0x90
      [  826.522894]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      [  826.523540] Last potentially related work creation:
      [  826.523969]  kasan_save_stack+0x1e/0x40
      [  826.524331]  __kasan_record_aux_stack+0x9f/0xb0
      [  826.524739]  insert_work+0x30/0x130
      [  826.525078]  __queue_work+0x34b/0x690
      [  826.525426]  queue_work_on+0x48/0x50
      [  826.525766]  __rhashtable_remove_fast_one+0x4af/0x4d0 [mlx5_core]
      [  826.526365]  del_sw_flow_group+0x1b5/0x270 [mlx5_core]
      [  826.526898]  tree_put_node+0x109/0x1c0 [mlx5_core]
      [  826.527407]  esw_offloads_steering_cleanup+0xd3/0x150 [mlx5_core]
      [  826.528009]  esw_offloads_disable+0xe0/0x160 [mlx5_core]
      [  826.528616]  mlx5_eswitch_disable_locked+0x26c/0x290 [mlx5_core]
      [  826.529218]  mlx5_devlink_eswitch_mode_set+0x128/0x580 [mlx5_core]
      [  826.529823]  devlink_nl_cmd_eswitch_set_doit+0xdf/0x1f0
      [  826.530276]  genl_family_rcv_msg_doit.isra.0+0x146/0x1c0
      [  826.530733]  genl_rcv_msg+0x28d/0x3e0
      [  826.531079]  netlink_rcv_skb+0xe0/0x210
      [  826.531439]  genl_rcv+0x24/0x40
      [  826.531755]  netlink_unicast+0x346/0x490
      [  826.532123]  netlink_sendmsg+0x3b0/0x6c0
      [  826.532487]  sock_sendmsg+0x73/0xc0
      [  826.532825]  __sys_sendto+0x18d/0x220
      [  826.533175]  __x64_sys_sendto+0x72/0x80
      [  826.533533]  do_syscall_64+0x3d/0x90
      [  826.533877]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      [  826.534521] The buggy address belongs to the object at ffff888194485800
                      which belongs to the cache kmalloc-512 of size 512
      [  826.535506] The buggy address is located 48 bytes inside of
                      freed 512-byte region [ffff888194485800, ffff888194485a00)
      
      [  826.536666] The buggy address belongs to the physical page:
      [  826.537138] page:00000000d75841dd refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x194480
      [  826.537915] head:00000000d75841dd order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
      [  826.538595] flags: 0x200000000010200(slab|head|node=0|zone=2)
      [  826.539089] raw: 0200000000010200 ffff888100042c80 ffffea0004523800 dead000000000002
      [  826.539755] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
      [  826.540417] page dumped because: kasan: bad access detected
      
      [  826.541095] Memory state around the buggy address:
      [  826.541519]  ffff888194485700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  826.542149]  ffff888194485780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  826.542773] >ffff888194485800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  826.543400]                                      ^
      [  826.543822]  ffff888194485880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  826.544452]  ffff888194485900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  826.545079] ==================================================================
      
      Fixes: 67027828 ("net/mlx5e: TC, Set CT miss to the specific ct action instance")
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      dfa1e46d
    • Rahul Rameshbabu's avatar
      net/mlx5e: Fix SQ wake logic in ptp napi_poll context · 7aa50380
      Rahul Rameshbabu authored
      Check in the mlx5e_ptp_poll_ts_cq context if the ptp tx sq should be woken
      up. Before change, the ptp tx sq may never wake up if the ptp tx ts skb
      fifo is full when mlx5e_poll_tx_cq checks if the queue should be woken up.
      
      Fixes: 1880bc4e ("net/mlx5e: Add TX port timestamp support")
      Signed-off-by: default avatarRahul Rameshbabu <rrameshbabu@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      7aa50380
    • Vlad Buslov's avatar
      net/mlx5e: Fix deadlock in tc route query code · 691c041b
      Vlad Buslov authored
      Cited commit causes ABBA deadlock[0] when peer flows are created while
      holding the devcom rw semaphore. Due to peer flows offload implementation
      the lock is taken much higher up the call chain and there is no obvious way
      to easily fix the deadlock. Instead, since tc route query code needs the
      peer eswitch structure only to perform a lookup in xarray and doesn't
      perform any sleeping operations with it, refactor the code for lockless
      execution in following ways:
      
      - RCUify the devcom 'data' pointer. When resetting the pointer
      synchronously wait for RCU grace period before returning. This is fine
      since devcom is currently only used for synchronization of
      pairing/unpairing of eswitches which is rare and already expensive as-is.
      
      - Wrap all usages of 'paired' boolean in {READ|WRITE}_ONCE(). The flag has
      already been used in some unlocked contexts without proper
      annotations (e.g. users of mlx5_devcom_is_paired() function), but it wasn't
      an issue since all relevant code paths checked it again after obtaining the
      devcom semaphore. Now it is also used by mlx5_devcom_get_peer_data_rcu() as
      "best effort" check to return NULL when devcom is being unpaired. Note that
      while RCU read lock doesn't prevent the unpaired flag from being changed
      concurrently it still guarantees that reader can continue to use 'data'.
      
      - Refactor mlx5e_tc_query_route_vport() function to use new
      mlx5_devcom_get_peer_data_rcu() API which fixes the deadlock.
      
      [0]:
      
      [  164.599612] ======================================================
      [  164.600142] WARNING: possible circular locking dependency detected
      [  164.600667] 6.3.0-rc3+ #1 Not tainted
      [  164.601021] ------------------------------------------------------
      [  164.601557] handler1/3456 is trying to acquire lock:
      [  164.601998] ffff88811f1714b0 (&esw->offloads.encap_tbl_lock){+.+.}-{3:3}, at: mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
      [  164.603078]
                     but task is already holding lock:
      [  164.603617] ffff88810137fc98 (&comp->sem){++++}-{3:3}, at: mlx5_devcom_get_peer_data+0x37/0x80 [mlx5_core]
      [  164.604459]
                     which lock already depends on the new lock.
      
      [  164.605190]
                     the existing dependency chain (in reverse order) is:
      [  164.605848]
                     -> #1 (&comp->sem){++++}-{3:3}:
      [  164.606380]        down_read+0x39/0x50
      [  164.606772]        mlx5_devcom_get_peer_data+0x37/0x80 [mlx5_core]
      [  164.607336]        mlx5e_tc_query_route_vport+0x86/0xc0 [mlx5_core]
      [  164.607914]        mlx5e_tc_tun_route_lookup+0x1a4/0x1d0 [mlx5_core]
      [  164.608495]        mlx5e_attach_decap_route+0xc6/0x1e0 [mlx5_core]
      [  164.609063]        mlx5e_tc_add_fdb_flow+0x1ea/0x360 [mlx5_core]
      [  164.609627]        __mlx5e_add_fdb_flow+0x2d2/0x430 [mlx5_core]
      [  164.610175]        mlx5e_configure_flower+0x952/0x1a20 [mlx5_core]
      [  164.610741]        tc_setup_cb_add+0xd4/0x200
      [  164.611146]        fl_hw_replace_filter+0x14c/0x1f0 [cls_flower]
      [  164.611661]        fl_change+0xc95/0x18a0 [cls_flower]
      [  164.612116]        tc_new_tfilter+0x3fc/0xd20
      [  164.612516]        rtnetlink_rcv_msg+0x418/0x5b0
      [  164.612936]        netlink_rcv_skb+0x54/0x100
      [  164.613339]        netlink_unicast+0x190/0x250
      [  164.613746]        netlink_sendmsg+0x245/0x4a0
      [  164.614150]        sock_sendmsg+0x38/0x60
      [  164.614522]        ____sys_sendmsg+0x1d0/0x1e0
      [  164.614934]        ___sys_sendmsg+0x80/0xc0
      [  164.615320]        __sys_sendmsg+0x51/0x90
      [  164.615701]        do_syscall_64+0x3d/0x90
      [  164.616083]        entry_SYSCALL_64_after_hwframe+0x46/0xb0
      [  164.616568]
                     -> #0 (&esw->offloads.encap_tbl_lock){+.+.}-{3:3}:
      [  164.617210]        __lock_acquire+0x159e/0x26e0
      [  164.617638]        lock_acquire+0xc2/0x2a0
      [  164.618018]        __mutex_lock+0x92/0xcd0
      [  164.618401]        mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
      [  164.618943]        post_process_attr+0x153/0x2d0 [mlx5_core]
      [  164.619471]        mlx5e_tc_add_fdb_flow+0x164/0x360 [mlx5_core]
      [  164.620021]        __mlx5e_add_fdb_flow+0x2d2/0x430 [mlx5_core]
      [  164.620564]        mlx5e_configure_flower+0xe33/0x1a20 [mlx5_core]
      [  164.621125]        tc_setup_cb_add+0xd4/0x200
      [  164.621531]        fl_hw_replace_filter+0x14c/0x1f0 [cls_flower]
      [  164.622047]        fl_change+0xc95/0x18a0 [cls_flower]
      [  164.622500]        tc_new_tfilter+0x3fc/0xd20
      [  164.622906]        rtnetlink_rcv_msg+0x418/0x5b0
      [  164.623324]        netlink_rcv_skb+0x54/0x100
      [  164.623727]        netlink_unicast+0x190/0x250
      [  164.624138]        netlink_sendmsg+0x245/0x4a0
      [  164.624544]        sock_sendmsg+0x38/0x60
      [  164.624919]        ____sys_sendmsg+0x1d0/0x1e0
      [  164.625340]        ___sys_sendmsg+0x80/0xc0
      [  164.625731]        __sys_sendmsg+0x51/0x90
      [  164.626117]        do_syscall_64+0x3d/0x90
      [  164.626502]        entry_SYSCALL_64_after_hwframe+0x46/0xb0
      [  164.626995]
                     other info that might help us debug this:
      
      [  164.627725]  Possible unsafe locking scenario:
      
      [  164.628268]        CPU0                    CPU1
      [  164.628683]        ----                    ----
      [  164.629098]   lock(&comp->sem);
      [  164.629421]                                lock(&esw->offloads.encap_tbl_lock);
      [  164.630066]                                lock(&comp->sem);
      [  164.630555]   lock(&esw->offloads.encap_tbl_lock);
      [  164.630993]
                      *** DEADLOCK ***
      
      [  164.631575] 3 locks held by handler1/3456:
      [  164.631962]  #0: ffff888124b75130 (&block->cb_lock){++++}-{3:3}, at: tc_setup_cb_add+0x5b/0x200
      [  164.632703]  #1: ffff888116e512b8 (&esw->mode_lock){++++}-{3:3}, at: mlx5_esw_hold+0x39/0x50 [mlx5_core]
      [  164.633552]  #2: ffff88810137fc98 (&comp->sem){++++}-{3:3}, at: mlx5_devcom_get_peer_data+0x37/0x80 [mlx5_core]
      [  164.634435]
                     stack backtrace:
      [  164.634883] CPU: 17 PID: 3456 Comm: handler1 Not tainted 6.3.0-rc3+ #1
      [  164.635431] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      [  164.636340] Call Trace:
      [  164.636616]  <TASK>
      [  164.636863]  dump_stack_lvl+0x47/0x70
      [  164.637217]  check_noncircular+0xfe/0x110
      [  164.637601]  __lock_acquire+0x159e/0x26e0
      [  164.637977]  ? mlx5_cmd_set_fte+0x5b0/0x830 [mlx5_core]
      [  164.638472]  lock_acquire+0xc2/0x2a0
      [  164.638828]  ? mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
      [  164.639339]  ? lock_is_held_type+0x98/0x110
      [  164.639728]  __mutex_lock+0x92/0xcd0
      [  164.640074]  ? mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
      [  164.640576]  ? __lock_acquire+0x382/0x26e0
      [  164.640958]  ? mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
      [  164.641468]  ? mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
      [  164.641965]  mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core]
      [  164.642454]  ? lock_release+0xbf/0x240
      [  164.642819]  post_process_attr+0x153/0x2d0 [mlx5_core]
      [  164.643318]  mlx5e_tc_add_fdb_flow+0x164/0x360 [mlx5_core]
      [  164.643835]  __mlx5e_add_fdb_flow+0x2d2/0x430 [mlx5_core]
      [  164.644340]  mlx5e_configure_flower+0xe33/0x1a20 [mlx5_core]
      [  164.644862]  ? lock_acquire+0xc2/0x2a0
      [  164.645219]  tc_setup_cb_add+0xd4/0x200
      [  164.645588]  fl_hw_replace_filter+0x14c/0x1f0 [cls_flower]
      [  164.646067]  fl_change+0xc95/0x18a0 [cls_flower]
      [  164.646488]  tc_new_tfilter+0x3fc/0xd20
      [  164.646861]  ? tc_del_tfilter+0x810/0x810
      [  164.647236]  rtnetlink_rcv_msg+0x418/0x5b0
      [  164.647621]  ? rtnl_setlink+0x160/0x160
      [  164.647982]  netlink_rcv_skb+0x54/0x100
      [  164.648348]  netlink_unicast+0x190/0x250
      [  164.648722]  netlink_sendmsg+0x245/0x4a0
      [  164.649090]  sock_sendmsg+0x38/0x60
      [  164.649434]  ____sys_sendmsg+0x1d0/0x1e0
      [  164.649804]  ? copy_msghdr_from_user+0x6d/0xa0
      [  164.650213]  ___sys_sendmsg+0x80/0xc0
      [  164.650563]  ? lock_acquire+0xc2/0x2a0
      [  164.650926]  ? lock_acquire+0xc2/0x2a0
      [  164.651286]  ? __fget_files+0x5/0x190
      [  164.651644]  ? find_held_lock+0x2b/0x80
      [  164.652006]  ? __fget_files+0xb9/0x190
      [  164.652365]  ? lock_release+0xbf/0x240
      [  164.652723]  ? __fget_files+0xd3/0x190
      [  164.653079]  __sys_sendmsg+0x51/0x90
      [  164.653435]  do_syscall_64+0x3d/0x90
      [  164.653784]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
      [  164.654229] RIP: 0033:0x7f378054f8bd
      [  164.654577] Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 6a c3 f4 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 be c3 f4 ff 48
      [  164.656041] RSP: 002b:00007f377fa114b0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
      [  164.656701] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f378054f8bd
      [  164.657297] RDX: 0000000000000000 RSI: 00007f377fa11540 RDI: 0000000000000014
      [  164.657885] RBP: 00007f377fa12278 R08: 0000000000000000 R09: 000000000000015c
      [  164.658472] R10: 00007f377fa123d0 R11: 0000000000000293 R12: 0000560962d99bd0
      [  164.665317] R13: 0000000000000000 R14: 0000560962d99bd0 R15: 00007f377fa11540
      
      Fixes: f9d196bd ("net/mlx5e: Use correct eswitch for stack devices with lag")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      691c041b
    • Roi Dayan's avatar
      net/mlx5: Fix error message when failing to allocate device memory · a6573514
      Roi Dayan authored
      Fix spacing for the error and also the correct error code pointer.
      
      Fixes: c9b9dcb4 ("net/mlx5: Move device memory management to mlx5_core")
      Signed-off-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      a6573514
    • Vlad Buslov's avatar
      net/mlx5e: Use correct encap attribute during invalidation · be071cdb
      Vlad Buslov authored
      With introduction of post action infrastructure most of the users of encap
      attribute had been modified in order to obtain the correct attribute by
      calling mlx5e_tc_get_encap_attr() helper instead of assuming encap action
      is always on default attribute. However, the cited commit didn't modify
      mlx5e_invalidate_encap() which prevents it from destroying correct modify
      header action which leads to a warning [0]. Fix the issue by using correct
      attribute.
      
      [0]:
      
      Feb 21 09:47:35 c-237-177-40-045 kernel: WARNING: CPU: 17 PID: 654 at drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:684 mlx5e_tc_attach_mod_hdr+0x1cc/0x230 [mlx5_core]
      Feb 21 09:47:35 c-237-177-40-045 kernel: RIP: 0010:mlx5e_tc_attach_mod_hdr+0x1cc/0x230 [mlx5_core]
      Feb 21 09:47:35 c-237-177-40-045 kernel: Call Trace:
      Feb 21 09:47:35 c-237-177-40-045 kernel:  <TASK>
      Feb 21 09:47:35 c-237-177-40-045 kernel:  mlx5e_tc_fib_event_work+0x8e3/0x1f60 [mlx5_core]
      Feb 21 09:47:35 c-237-177-40-045 kernel:  ? mlx5e_take_all_encap_flows+0xe0/0xe0 [mlx5_core]
      Feb 21 09:47:35 c-237-177-40-045 kernel:  ? lock_downgrade+0x6d0/0x6d0
      Feb 21 09:47:35 c-237-177-40-045 kernel:  ? lockdep_hardirqs_on_prepare+0x273/0x3f0
      Feb 21 09:47:35 c-237-177-40-045 kernel:  ? lockdep_hardirqs_on_prepare+0x273/0x3f0
      Feb 21 09:47:35 c-237-177-40-045 kernel:  process_one_work+0x7c2/0x1310
      Feb 21 09:47:35 c-237-177-40-045 kernel:  ? lockdep_hardirqs_on_prepare+0x3f0/0x3f0
      Feb 21 09:47:35 c-237-177-40-045 kernel:  ? pwq_dec_nr_in_flight+0x230/0x230
      Feb 21 09:47:35 c-237-177-40-045 kernel:  ? rwlock_bug.part.0+0x90/0x90
      Feb 21 09:47:35 c-237-177-40-045 kernel:  worker_thread+0x59d/0xec0
      Feb 21 09:47:35 c-237-177-40-045 kernel:  ? __kthread_parkme+0xd9/0x1d0
      
      Fixes: 8300f225 ("net/mlx5e: Create new flow attr for multi table actions")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      be071cdb
    • Yevgeny Kliteynik's avatar
      net/mlx5: DR, Check force-loopback RC QP capability independently from RoCE · c7dd225b
      Yevgeny Kliteynik authored
      SW Steering uses RC QP for writing STEs to ICM. This writingis done in LB
      (loopback), and FL (force-loopback) QP is preferred for performance. FL is
      available when RoCE is enabled or disabled based on RoCE caps.
      This patch adds reading of FL capability from HCA caps in addition to the
      existing reading from RoCE caps, thus fixing the case where we didn't
      have loopback enabled when RoCE was disabled.
      
      Fixes: 7304d603 ("net/mlx5: DR, Add support for force-loopback QP")
      Signed-off-by: default avatarItamar Gozlan <igozlan@nvidia.com>
      Signed-off-by: default avatarYevgeny Kliteynik <kliteyn@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      c7dd225b
    • Erez Shitrit's avatar
      net/mlx5: DR, Fix crc32 calculation to work on big-endian (BE) CPUs · 1e5daf55
      Erez Shitrit authored
      When calculating crc for hash index we use the function crc32 that
      calculates for little-endian (LE) arch.
      Then we convert it to network endianness using htonl(), but it's wrong
      to do the conversion in BE archs since the crc32 value is already LE.
      
      The solution is to switch the bytes from the crc result for all types
      of arc.
      
      Fixes: 40416d8e ("net/mlx5: DR, Replace CRC32 implementation to use kernel lib")
      Signed-off-by: default avatarErez Shitrit <erezsh@nvidia.com>
      Reviewed-by: default avatarAlex Vesker <valex@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      1e5daf55
    • Shay Drory's avatar
      net/mlx5: Handle pairing of E-switch via uplink un/load APIs · 2be5bd42
      Shay Drory authored
      In case user switch a device from switchdev mode to legacy mode, mlx5
      first unpair the E-switch and afterwards unload the uplink vport.
      From the other hand, in case user remove or reload a device, mlx5
      first unload the uplink vport and afterwards unpair the E-switch.
      
      The latter is causing a bug[1], hence, handle pairing of E-switch as
      part of uplink un/load APIs.
      
      [1]
      In case VF_LAG is used, every tc fdb flow is duplicated to the peer
      esw. However, the original esw keeps a pointer to this duplicated
      flow, not the peer esw.
      e.g.: if user create tc fdb flow over esw0, the flow is duplicated
      over esw1, in FW/HW, but in SW, esw0 keeps a pointer to the duplicated
      flow.
      During module unload while a peer tc fdb flow is still offloaded, in
      case the first device to be removed is the peer device (esw1 in the
      example above), the peer net-dev is destroyed, and so the mlx5e_priv
      is memset to 0.
      Afterwards, the peer device is trying to unpair himself from the
      original device (esw0 in the example above). Unpair API invoke the
      original device to clear peer flow from its eswitch (esw0), but the
      peer flow, which is stored over the original eswitch (esw0), is
      trying to use the peer mlx5e_priv, which is memset to 0 and result in
      bellow kernel-oops.
      
      [  157.964081 ] BUG: unable to handle page fault for address: 000000000002ce60
      [  157.964662 ] #PF: supervisor read access in kernel mode
      [  157.965123 ] #PF: error_code(0x0000) - not-present page
      [  157.965582 ] PGD 0 P4D 0
      [  157.965866 ] Oops: 0000 [#1] SMP
      [  157.967670 ] RIP: 0010:mlx5e_tc_del_fdb_flow+0x48/0x460 [mlx5_core]
      [  157.976164 ] Call Trace:
      [  157.976437 ]  <TASK>
      [  157.976690 ]  __mlx5e_tc_del_fdb_peer_flow+0xe6/0x100 [mlx5_core]
      [  157.977230 ]  mlx5e_tc_clean_fdb_peer_flows+0x67/0x90 [mlx5_core]
      [  157.977767 ]  mlx5_esw_offloads_unpair+0x2d/0x1e0 [mlx5_core]
      [  157.984653 ]  mlx5_esw_offloads_devcom_event+0xbf/0x130 [mlx5_core]
      [  157.985212 ]  mlx5_devcom_send_event+0xa3/0xb0 [mlx5_core]
      [  157.985714 ]  esw_offloads_disable+0x5a/0x110 [mlx5_core]
      [  157.986209 ]  mlx5_eswitch_disable_locked+0x152/0x170 [mlx5_core]
      [  157.986757 ]  mlx5_eswitch_disable+0x51/0x80 [mlx5_core]
      [  157.987248 ]  mlx5_unload+0x2a/0xb0 [mlx5_core]
      [  157.987678 ]  mlx5_uninit_one+0x5f/0xd0 [mlx5_core]
      [  157.988127 ]  remove_one+0x64/0xe0 [mlx5_core]
      [  157.988549 ]  pci_device_remove+0x31/0xa0
      [  157.988933 ]  device_release_driver_internal+0x18f/0x1f0
      [  157.989402 ]  driver_detach+0x3f/0x80
      [  157.989754 ]  bus_remove_driver+0x70/0xf0
      [  157.990129 ]  pci_unregister_driver+0x34/0x90
      [  157.990537 ]  mlx5_cleanup+0xc/0x1c [mlx5_core]
      [  157.990972 ]  __x64_sys_delete_module+0x15a/0x250
      [  157.991398 ]  ? exit_to_user_mode_prepare+0xea/0x110
      [  157.991840 ]  do_syscall_64+0x3d/0x90
      [  157.992198 ]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      Fixes: 04de7dda ("net/mlx5e: Infrastructure for duplicated offloading of TC flows")
      Fixes: 1418ddd9 ("net/mlx5e: Duplicate offloaded TC eswitch rules under uplink LAG")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      2be5bd42
    • Shay Drory's avatar
      net/mlx5: Collect command failures data only for known commands · 2a0a935f
      Shay Drory authored
      DEVX can issue a general command, which is not used by mlx5 driver.
      In case such command is failed, mlx5 is trying to collect the failure
      data, However, mlx5 doesn't create a storage for this command, since
      mlx5 doesn't use it. This lead to array-index-out-of-bounds error.
      
      Fix it by checking whether the command is known before collecting the
      failure data.
      
      Fixes: 34f46ae0 ("net/mlx5: Add command failures data to debugfs")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      2a0a935f
    • Chuck Lever's avatar
      net/handshake: Fix sock->file allocation · 18c40a1c
      Chuck Lever authored
      	sock->file = sock_alloc_file(sock, O_NONBLOCK, NULL);
      	^^^^                         ^^^^
      
      sock_alloc_file() calls release_sock() on error but the left hand
      side of the assignment dereferences "sock".  This isn't the bug and
      I didn't report this earlier because there is an assert that it
      doesn't fail.
      
      net/handshake/handshake-test.c:221 handshake_req_submit_test4() error: dereferencing freed memory 'sock'
      net/handshake/handshake-test.c:233 handshake_req_submit_test4() warn: 'req' was already freed.
      net/handshake/handshake-test.c:254 handshake_req_submit_test5() error: dereferencing freed memory 'sock'
      net/handshake/handshake-test.c:290 handshake_req_submit_test6() error: dereferencing freed memory 'sock'
      net/handshake/handshake-test.c:321 handshake_req_cancel_test1() error: dereferencing freed memory 'sock'
      net/handshake/handshake-test.c:355 handshake_req_cancel_test2() error: dereferencing freed memory 'sock'
      net/handshake/handshake-test.c:367 handshake_req_cancel_test2() warn: 'req' was already freed.
      net/handshake/handshake-test.c:395 handshake_req_cancel_test3() error: dereferencing freed memory 'sock'
      net/handshake/handshake-test.c:407 handshake_req_cancel_test3() warn: 'req' was already freed.
      net/handshake/handshake-test.c:451 handshake_req_destroy_test1() error: dereferencing freed memory 'sock'
      net/handshake/handshake-test.c:463 handshake_req_destroy_test1() warn: 'req' was already freed.
      Reported-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Fixes: 88232ec1 ("net/handshake: Add Kunit tests for the handshake consumer API")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Link: https://lore.kernel.org/r/168451609436.45209.15407022385441542980.stgit@oracle-102.nfsv4bat.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      18c40a1c
    • Chuck Lever's avatar
      net/handshake: Squelch allocation warning during Kunit test · b21c7ba6
      Chuck Lever authored
      The "handshake_req_alloc excessive privsize" kunit test is intended
      to check what happens when the maximum privsize is exceeded. The
      WARN_ON_ONCE_GFP at mm/page_alloc.c:4744 can be disabled safely for
      this test.
      Reported-by: default avatarLinux Kernel Functional Testing <lkft@linaro.org>
      Fixes: 88232ec1 ("net/handshake: Add Kunit tests for the handshake consumer API")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Link: https://lore.kernel.org/r/168451636052.47152.9600443326570457947.stgit@oracle-102.nfsv4bat.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b21c7ba6
    • Christophe JAILLET's avatar
      3c589_cs: Fix an error handling path in tc589_probe() · 640bf95b
      Christophe JAILLET authored
      Should tc589_config() fail, some resources need to be released as already
      done in the remove function.
      
      Fixes: 15b99ac1 ("[PATCH] pcmcia: add return value to _config() functions")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/d8593ae867b24c79063646e36f9b18b0790107cb.1684575975.git.christophe.jaillet@wanadoo.frSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      640bf95b
    • Christophe JAILLET's avatar
      forcedeth: Fix an error handling path in nv_probe() · 5b17a497
      Christophe JAILLET authored
      If an error occures after calling nv_mgmt_acquire_sema(), it should be
      undone with a corresponding nv_mgmt_release_sema() call.
      
      Add it in the error handling path of the probe as already done in the
      remove function.
      
      Fixes: cac1c52c ("forcedeth: mgmt unit interface")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Acked-by: default avatarZhu Yanjun <zyjzyj2000@gmail.com>
      Link: https://lore.kernel.org/r/355e9a7d351b32ad897251b6f81b5886fcdc6766.1684571393.git.christophe.jaillet@wanadoo.frSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5b17a497
  2. 22 May, 2023 1 commit
    • Xin Long's avatar
      sctp: fix an issue that plpmtu can never go to complete state · 6ca328e9
      Xin Long authored
      When doing plpmtu probe, the probe size is growing every time when it
      receives the ACK during the Search state until the probe fails. When
      the failure occurs, pl.probe_high is set and it goes to the Complete
      state.
      
      However, if the link pmtu is huge, like 65535 in loopback_dev, the probe
      eventually keeps using SCTP_MAX_PLPMTU as the probe size and never fails.
      Because of that, pl.probe_high can not be set, and the plpmtu probe can
      never go to the Complete state.
      
      Fix it by setting pl.probe_high to SCTP_MAX_PLPMTU when the probe size
      grows to SCTP_MAX_PLPMTU in sctp_transport_pl_recv(). Also, not allow
      the probe size greater than SCTP_MAX_PLPMTU in the Complete state.
      
      Fixes: b87641af ("sctp: do state transition when a probe succeeds on HB ACK recv path")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6ca328e9
  3. 20 May, 2023 2 commits
    • Jakub Kicinski's avatar
      Merge tag 'for-net-2023-05-19' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth · 67caf26d
      Jakub Kicinski authored
      Luiz Augusto von Dentz says:
      
      ====================
      bluetooth pull request for net:
      
       - Fix compiler warnings on btnxpuart
       - Fix potential double free on hci_conn_unlink
       - Fix UAF on hci_conn_hash_flush
      
      * tag 'for-net-2023-05-19' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
        Bluetooth: btnxpuart: Fix compiler warnings
        Bluetooth: Unlink CISes when LE disconnects in hci_conn_del
        Bluetooth: Fix UAF in hci_conn_hash_flush again
        Bluetooth: Refcnt drop must be placed last in hci_conn_unlink
        Bluetooth: Fix potential double free caused by hci_conn_unlink
      ====================
      
      Link: https://lore.kernel.org/r/20230519233056.2024340-1-luiz.dentz@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      67caf26d
    • Taehee Yoo's avatar
      net: fix stack overflow when LRO is disabled for virtual interfaces · ae9b15fb
      Taehee Yoo authored
      When the virtual interface's feature is updated, it synchronizes the
      updated feature for its own lower interface.
      This propagation logic should be worked as the iteration, not recursively.
      But it works recursively due to the netdev notification unexpectedly.
      This problem occurs when it disables LRO only for the team and bonding
      interface type.
      
             team0
               |
        +------+------+-----+-----+
        |      |      |     |     |
      team1  team2  team3  ...  team200
      
      If team0's LRO feature is updated, it generates the NETDEV_FEAT_CHANGE
      event to its own lower interfaces(team1 ~ team200).
      It is worked by netdev_sync_lower_features().
      So, the NETDEV_FEAT_CHANGE notification logic of each lower interface
      work iteratively.
      But generated NETDEV_FEAT_CHANGE event is also sent to the upper
      interface too.
      upper interface(team0) generates the NETDEV_FEAT_CHANGE event for its own
      lower interfaces again.
      lower and upper interfaces receive this event and generate this
      event again and again.
      So, the stack overflow occurs.
      
      But it is not the infinite loop issue.
      Because the netdev_sync_lower_features() updates features before
      generating the NETDEV_FEAT_CHANGE event.
      Already synchronized lower interfaces skip notification logic.
      So, it is just the problem that iteration logic is changed to the
      recursive unexpectedly due to the notification mechanism.
      
      Reproducer:
      
      ip link add team0 type team
      ethtool -K team0 lro on
      for i in {1..200}
      do
              ip link add team$i master team0 type team
              ethtool -K team$i lro on
      done
      
      ethtool -K team0 lro off
      
      In order to fix it, the notifier_ctx member of bonding/team is introduced.
      
      Reported-by: syzbot+60748c96cf5c6df8e581@syzkaller.appspotmail.com
      Fixes: fd867d51 ("net/core: generic support for disabling netdev features down stack")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Link: https://lore.kernel.org/r/20230517143010.3596250-1-ap420073@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ae9b15fb
  4. 19 May, 2023 21 commits
    • Neeraj Sanjay Kale's avatar
      Bluetooth: btnxpuart: Fix compiler warnings · 6ce5169e
      Neeraj Sanjay Kale authored
      This fixes the follwing compiler warning reported by kernel test robot:
      
        drivers/bluetooth/btnxpuart.c:1332:34: warning: unused variable
        'nxpuart_of_match_table' [-Wunused-const-variable]
      Signed-off-by: default avatarNeeraj Sanjay Kale <neeraj.sanjaykale@nxp.com>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Link: https://lore.kernel.org/oe-kbuild-all/202305161345.eClvTYQ9-lkp@intel.com/Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      6ce5169e
    • Ruihan Li's avatar
      Bluetooth: Unlink CISes when LE disconnects in hci_conn_del · a2904d28
      Ruihan Li authored
      Currently, hci_conn_del calls hci_conn_unlink for BR/EDR, (e)SCO, and
      CIS connections, i.e., everything except LE connections. However, if
      (e)SCO connections are unlinked when BR/EDR disconnects, CIS connections
      should also be unlinked when LE disconnects.
      
      In terms of disconnection behavior, CIS and (e)SCO connections are not
      too different. One peculiarity of CIS is that when CIS connections are
      disconnected, the CIS handle isn't deleted, as per [BLUETOOTH CORE
      SPECIFICATION Version 5.4 | Vol 4, Part E] 7.1.6 Disconnect command:
      
              All SCO, eSCO, and CIS connections on a physical link should be
              disconnected before the ACL connection on the same physical
              connection is disconnected. If it does not, they will be
              implicitly disconnected as part of the ACL disconnection.
              ...
              Note: As specified in Section 7.7.5, on the Central, the handle
              for a CIS remains valid even after disconnection and, therefore,
              the Host can recreate a disconnected CIS at a later point in
              time using the same connection handle.
      
      Since hci_conn_link invokes both hci_conn_get and hci_conn_hold,
      hci_conn_unlink should perform both hci_conn_put and hci_conn_drop as
      well. However, currently it performs only hci_conn_put.
      
      This patch makes hci_conn_unlink call hci_conn_drop as well, which
      simplifies the logic in hci_conn_del a bit and may benefit future users
      of hci_conn_unlink. But it is noted that this change additionally
      implies that hci_conn_unlink can queue disc_work on conn itself, with
      the following call stack:
      
              hci_conn_unlink(conn)  [conn->parent == NULL]
                      -> hci_conn_unlink(child)  [child->parent == conn]
                              -> hci_conn_drop(child->parent)
                                      -> queue_delayed_work(&conn->disc_work)
      
      Queued disc_work after hci_conn_del can be spurious, so during the
      process of hci_conn_del, it is necessary to make the call to
      cancel_delayed_work(&conn->disc_work) after invoking hci_conn_unlink.
      Signed-off-by: default avatarRuihan Li <lrh2000@pku.edu.cn>
      Co-developed-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      a2904d28
    • Ruihan Li's avatar
      Bluetooth: Fix UAF in hci_conn_hash_flush again · a2ac591c
      Ruihan Li authored
      Commit 06149746 ("Bluetooth: hci_conn: Add support for linking
      multiple hcon") reintroduced a previously fixed bug [1] ("KASAN:
      slab-use-after-free Read in hci_conn_hash_flush"). This bug was
      originally fixed by commit 5dc7d23e ("Bluetooth: hci_conn: Fix
      possible UAF").
      
      The hci_conn_unlink function was added to avoid invalidating the link
      traversal caused by successive hci_conn_del operations releasing extra
      connections. However, currently hci_conn_unlink itself also releases
      extra connections, resulted in the reintroduced bug.
      
      This patch follows a more robust solution for cleaning up all
      connections, by repeatedly removing the first connection until there are
      none left. This approach does not rely on the inner workings of
      hci_conn_del and ensures proper cleanup of all connections.
      
      Meanwhile, we need to make sure that hci_conn_del never fails. Indeed it
      doesn't, as it now always returns zero. To make this a bit clearer, this
      patch also changes its return type to void.
      
      Reported-by: syzbot+8bb72f86fc823817bc5d@syzkaller.appspotmail.com
      Closes: https://lore.kernel.org/linux-bluetooth/000000000000aa920505f60d25ad@google.com/
      Fixes: 06149746 ("Bluetooth: hci_conn: Add support for linking multiple hcon")
      Signed-off-by: default avatarRuihan Li <lrh2000@pku.edu.cn>
      Co-developed-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      a2ac591c
    • Ruihan Li's avatar
      Bluetooth: Refcnt drop must be placed last in hci_conn_unlink · 2910431a
      Ruihan Li authored
      If hci_conn_put(conn->parent) reduces conn->parent's reference count to
      zero, it can immediately deallocate conn->parent. At the same time,
      conn->link->list has its head in conn->parent, causing use-after-free
      problems in the latter list_del_rcu(&conn->link->list).
      
      This problem can be easily solved by reordering the two operations,
      i.e., first performing the list removal with list_del_rcu and then
      decreasing the refcnt with hci_conn_put.
      Reported-by: default avatarLuiz Augusto von Dentz <luiz.dentz@gmail.com>
      Closes: https://lore.kernel.org/linux-bluetooth/CABBYNZ+1kce8_RJrLNOXd_8=Mdpb=2bx4Nto-hFORk=qiOkoCg@mail.gmail.com/
      Fixes: 06149746 ("Bluetooth: hci_conn: Add support for linking multiple hcon")
      Signed-off-by: default avatarRuihan Li <lrh2000@pku.edu.cn>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      2910431a
    • Ruihan Li's avatar
      Bluetooth: Fix potential double free caused by hci_conn_unlink · ca1fd42e
      Ruihan Li authored
      The hci_conn_unlink function is being called by hci_conn_del, which
      means it should not call hci_conn_del with the input parameter conn
      again. If it does, conn may have already been released when
      hci_conn_unlink returns, leading to potential UAF and double-free
      issues.
      
      This patch resolves the problem by modifying hci_conn_unlink to release
      only conn's child links when necessary, but never release conn itself.
      
      Reported-by: syzbot+690b90b14f14f43f4688@syzkaller.appspotmail.com
      Closes: https://lore.kernel.org/linux-bluetooth/000000000000484a8205faafe216@google.com/
      Fixes: 06149746 ("Bluetooth: hci_conn: Add support for linking multiple hcon")
      Signed-off-by: default avatarRuihan Li <lrh2000@pku.edu.cn>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Reported-by: syzbot+690b90b14f14f43f4688@syzkaller.appspotmail.com
      Reported-by: default avatarLuiz Augusto von Dentz <luiz.dentz@gmail.com>
      Reported-by: syzbot+8bb72f86fc823817bc5d@syzkaller.appspotmail.com
      ca1fd42e
    • Shenwei Wang's avatar
      net: fec: add dma_wmb to ensure correct descriptor values · 9025944f
      Shenwei Wang authored
      Two dma_wmb() are added in the XDP TX path to ensure proper ordering of
      descriptor and buffer updates:
      1. A dma_wmb() is added after updating the last BD to make sure
         the updates to rest of the descriptor are visible before
         transferring ownership to FEC.
      2. A dma_wmb() is also added after updating the bdp to ensure these
         updates are visible before updating txq->bd.cur.
      3. Start the xmit of the frame immediately right after configuring the
         tx descriptor.
      
      Fixes: 6d6b39f1 ("net: fec: add initial XDP support")
      Signed-off-by: default avatarShenwei Wang <shenwei.wang@nxp.com>
      Reviewed-by: default avatarWei Fang <wei.fang@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9025944f
    • Vladimir Oltean's avatar
      MAINTAINERS: add myself as maintainer for enetc · 3be5f6cd
      Vladimir Oltean authored
      I would like to be copied on new patches submitted on this driver.
      I am relatively familiar with the code, having practically maintained
      it for a while.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3be5f6cd
    • Sunil Goutham's avatar
      octeontx2-pf: Fix TSOv6 offload · de678ca3
      Sunil Goutham authored
      HW adds segment size to the payload length
      in the IPv6 header. Fix payload length to
      just TCP header length instead of 'TCP header
      size + IPv6 header size'.
      
      Fixes: 86d74760 ("octeontx2-pf: TCP segmentation offload support")
      Signed-off-by: default avatarSunil Goutham <sgoutham@marvell.com>
      Signed-off-by: default avatarRatheesh Kannoth <rkannoth@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de678ca3
    • Alejandro Lucero's avatar
      sfc: fix devlink info error handling · cfcb9428
      Alejandro Lucero authored
      Avoid early devlink info return if errors arise with MCDI commands
      executed for getting the required info from the device. The rationale
      is some commands can fail but later ones could still give useful data.
      Moreover, some nvram partitions could not be present which needs to be
      handled as a non error.
      
      The specific errors are reported through system messages and if any
      error appears, it will be reported generically through extack.
      
      Fixes 14743ddd ("sfc: add devlink info support for ef100")
      Signed-off-by: default avatarAlejandro Lucero <alejandro.lucero-palau@amd.com>
      Acked-by: default avatarMartin Habets <habetsm.xilinx@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cfcb9428
    • Wen Gu's avatar
      net/smc: Reset connection when trying to use SMCRv2 fails. · 35112271
      Wen Gu authored
      We found a crash when using SMCRv2 with 2 Mellanox ConnectX-4. It
      can be reproduced by:
      
      - smc_run nginx
      - smc_run wrk -t 32 -c 500 -d 30 http://<ip>:<port>
      
       BUG: kernel NULL pointer dereference, address: 0000000000000014
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 8000000108713067 P4D 8000000108713067 PUD 151127067 PMD 0
       Oops: 0000 [#1] PREEMPT SMP PTI
       CPU: 4 PID: 2441 Comm: kworker/4:249 Kdump: loaded Tainted: G        W   E      6.4.0-rc1+ #42
       Workqueue: smc_hs_wq smc_listen_work [smc]
       RIP: 0010:smc_clc_send_confirm_accept+0x284/0x580 [smc]
       RSP: 0018:ffffb8294b2d7c78 EFLAGS: 00010a06
       RAX: ffff8f1873238880 RBX: ffffb8294b2d7dc8 RCX: 0000000000000000
       RDX: 00000000000000b4 RSI: 0000000000000001 RDI: 0000000000b40c00
       RBP: ffffb8294b2d7db8 R08: ffff8f1815c5860c R09: 0000000000000000
       R10: 0000000000000400 R11: 0000000000000000 R12: ffff8f1846f56180
       R13: ffff8f1815c5860c R14: 0000000000000001 R15: 0000000000000001
       FS:  0000000000000000(0000) GS:ffff8f1aefd00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000014 CR3: 00000001027a0001 CR4: 00000000003706e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        <TASK>
        ? mlx5_ib_map_mr_sg+0xa1/0xd0 [mlx5_ib]
        ? smcr_buf_map_link+0x24b/0x290 [smc]
        ? __smc_buf_create+0x4ee/0x9b0 [smc]
        smc_clc_send_accept+0x4c/0xb0 [smc]
        smc_listen_work+0x346/0x650 [smc]
        ? __schedule+0x279/0x820
        process_one_work+0x1e5/0x3f0
        worker_thread+0x4d/0x2f0
        ? __pfx_worker_thread+0x10/0x10
        kthread+0xe5/0x120
        ? __pfx_kthread+0x10/0x10
        ret_from_fork+0x2c/0x50
        </TASK>
      
      During the CLC handshake, server sequentially tries available SMCRv2
      and SMCRv1 devices in smc_listen_work().
      
      If an SMCRv2 device is found. SMCv2 based link group and link will be
      assigned to the connection. Then assumed that some buffer assignment
      errors happen later in the CLC handshake, such as RMB registration
      failure, server will give up SMCRv2 and try SMCRv1 device instead. But
      the resources assigned to the connection won't be reset.
      
      When server tries SMCRv1 device, the connection creation process will
      be executed again. Since conn->lnk has been assigned when trying SMCRv2,
      it will not be set to the correct SMCRv1 link in
      smcr_lgr_conn_assign_link(). So in such situation, conn->lgr points to
      correct SMCRv1 link group but conn->lnk points to the SMCRv2 link
      mistakenly.
      
      Then in smc_clc_send_confirm_accept(), conn->rmb_desc->mr[link->link_idx]
      will be accessed. Since the link->link_idx is not correct, the related
      MR may not have been initialized, so crash happens.
      
       | Try SMCRv2 device first
       |     |-> conn->lgr:	assign existed SMCRv2 link group;
       |     |-> conn->link:	assign existed SMCRv2 link (link_idx may be 1 in SMC_LGR_SYMMETRIC);
       |     |-> sndbuf & RMB creation fails, quit;
       |
       | Try SMCRv1 device then
       |     |-> conn->lgr:	create SMCRv1 link group and assign;
       |     |-> conn->link:	keep SMCRv2 link mistakenly;
       |     |-> sndbuf & RMB creation succeed, only RMB->mr[link_idx = 0]
       |         initialized.
       |
       | Then smc_clc_send_confirm_accept() accesses
       | conn->rmb_desc->mr[conn->link->link_idx, which is 1], then crash.
       v
      
      This patch tries to fix this by cleaning conn->lnk before assigning
      link. In addition, it is better to reset the connection and clean the
      resources assigned if trying SMCRv2 failed in buffer creation or
      registration.
      
      Fixes: e49300a6 ("net/smc: add listen processing for SMC-Rv2")
      Link: https://lore.kernel.org/r/20220523055056.2078994-1-liuyacan@corp.netease.com/Signed-off-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Reviewed-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      35112271
    • Po-Hsu Lin's avatar
      selftests: fib_tests: mute cleanup error message · d226b1df
      Po-Hsu Lin authored
      In the end of the test, there will be an error message induced by the
      `ip netns del ns1` command in cleanup()
      
        Tests passed: 201
        Tests failed:   0
        Cannot remove namespace file "/run/netns/ns1": No such file or directory
      
      This can even be reproduced with just `./fib_tests.sh -h` as we're
      calling cleanup() on exit.
      
      Redirect the error message to /dev/null to mute it.
      
      V2: Update commit message and fixes tag.
      V3: resubmit due to missing netdev ML in V2
      
      Fixes: b60417a9 ("selftest: fib_tests: Always cleanup before exit")
      Signed-off-by: default avatarPo-Hsu Lin <po-hsu.lin@canonical.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d226b1df
    • Jakub Kicinski's avatar
      net/mlx5e: do as little as possible in napi poll when budget is 0 · afbed3f7
      Jakub Kicinski authored
      NAPI gets called with budget of 0 from netpoll, which has interrupts
      disabled. We should try to free some space on Tx rings and nothing
      else.
      
      Specifically do not try to handle XDP TX or try to refill Rx buffers -
      we can't use the page pool from IRQ context. Don't check if IRQs moved,
      either, that makes no sense in netpoll. Netpoll calls _all_ the rings
      from whatever CPU it happens to be invoked on.
      
      In general do as little as possible, the work quickly adds up when
      there's tens of rings to poll.
      
      The immediate stack trace I was seeing is:
      
          __do_softirq+0xd1/0x2c0
          __local_bh_enable_ip+0xc7/0x120
          </IRQ>
          <TASK>
          page_pool_put_defragged_page+0x267/0x320
          mlx5e_free_xdpsq_desc+0x99/0xd0
          mlx5e_poll_xdpsq_cq+0x138/0x3b0
          mlx5e_napi_poll+0xc3/0x8b0
          netpoll_poll_dev+0xce/0x150
      
      AFAIU page pool takes a BH lock, releases it and since BH is now
      enabled tries to run softirqs.
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Fixes: 60bbf7ee ("mlx5: use page_pool for xdp_return_frame call")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      afbed3f7
    • David S. Miller's avatar
      Merge branch 'tls-fixes' · 2897041e
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      tls: rx: strp: fix inline crypto offload
      
      The local strparser version I added to TLS does not preserve
      decryption status, which breaks inline crypto (NIC offload).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2897041e
    • Jakub Kicinski's avatar
      tls: rx: strp: don't use GFP_KERNEL in softirq context · 74836ec8
      Jakub Kicinski authored
      When receive buffer is small, or the TCP rx queue looks too
      complicated to bother using it directly - we allocate a new
      skb and copy data into it.
      
      We already use sk->sk_allocation... but nothing actually
      sets it to GFP_ATOMIC on the ->sk_data_ready() path.
      
      Users of HW offload are far more likely to experience problems
      due to scheduling while atomic. "Copy mode" is very rarely
      triggered with SW crypto.
      
      Fixes: 84c61fe1 ("tls: rx: do not use the standard strparser")
      Tested-by: default avatarShai Amiram <samiram@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      74836ec8
    • Jakub Kicinski's avatar
      tls: rx: strp: preserve decryption status of skbs when needed · eca9bfaf
      Jakub Kicinski authored
      When receive buffer is small we try to copy out the data from
      TCP into a skb maintained by TLS to prevent connection from
      stalling. Unfortunately if a single record is made up of a mix
      of decrypted and non-decrypted skbs combining them into a single
      skb leads to loss of decryption status, resulting in decryption
      errors or data corruption.
      
      Similarly when trying to use TCP receive queue directly we need
      to make sure that all the skbs within the record have the same
      status. If we don't the mixed status will be detected correctly
      but we'll CoW the anchor, again collapsing it into a single paged
      skb without decrypted status preserved. So the "fixup" code will
      not know which parts of skb to re-encrypt.
      
      Fixes: 84c61fe1 ("tls: rx: do not use the standard strparser")
      Tested-by: default avatarShai Amiram <samiram@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eca9bfaf
    • Jakub Kicinski's avatar
      tls: rx: strp: factor out copying skb data · c1c607b1
      Jakub Kicinski authored
      We'll need to copy input skbs individually in the next patch.
      Factor that code out (without assuming we're copying a full record).
      Tested-by: default avatarShai Amiram <samiram@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1c607b1
    • Jakub Kicinski's avatar
      tls: rx: strp: fix determining record length in copy mode · 8b0c0dc9
      Jakub Kicinski authored
      We call tls_rx_msg_size(skb) before doing skb->len += chunk.
      So the tls_rx_msg_size() code will see old skb->len, most
      likely leading to an over-read.
      
      Worst case we will over read an entire record, next iteration
      will try to trim the skb but may end up turning frag len negative
      or discarding the subsequent record (since we already told TCP
      we've read it during previous read but now we'll trim it out of
      the skb).
      
      Fixes: 84c61fe1 ("tls: rx: do not use the standard strparser")
      Tested-by: default avatarShai Amiram <samiram@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8b0c0dc9
    • Jakub Kicinski's avatar
      tls: rx: strp: force mixed decrypted records into copy mode · 14c4be92
      Jakub Kicinski authored
      If a record is partially decrypted we'll have to CoW it, anyway,
      so go into copy mode and allocate a writable skb right away.
      
      This will make subsequent fix simpler because we won't have to
      teach tls_strp_msg_make_copy() how to copy skbs while preserving
      decrypt status.
      Tested-by: default avatarShai Amiram <samiram@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      14c4be92
    • Jakub Kicinski's avatar
      tls: rx: strp: set the skb->len of detached / CoW'ed skbs · 210620ae
      Jakub Kicinski authored
      alloc_skb_with_frags() fills in page frag sizes but does not
      set skb->len and skb->data_len. Set those correctly otherwise
      device offload will most likely generate an empty skb and
      hit the BUG() at the end of __skb_nsg().
      
      Fixes: 84c61fe1 ("tls: rx: do not use the standard strparser")
      Tested-by: default avatarShai Amiram <samiram@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      210620ae
    • Jakub Kicinski's avatar
      tls: rx: device: fix checking decryption status · b3a03b54
      Jakub Kicinski authored
      skb->len covers the entire skb, including the frag_list.
      In fact we're guaranteed that rxm->full_len <= skb->len,
      so since the change under Fixes we were not checking decrypt
      status of any skb but the first.
      
      Note that the skb_pagelen() added here may feel a bit costly,
      but it's removed by subsequent fixes, anyway.
      Reported-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Fixes: 86b259f6 ("tls: rx: device: bound the frag walk")
      Tested-by: default avatarShai Amiram <samiram@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b3a03b54
    • Tudor Ambarus's avatar
      net: cdc_ncm: Deal with too low values of dwNtbOutMaxSize · 7e01c7f7
      Tudor Ambarus authored
      Currently in cdc_ncm_check_tx_max(), if dwNtbOutMaxSize is lower than
      the calculated "min" value, but greater than zero, the logic sets
      tx_max to dwNtbOutMaxSize. This is then used to allocate a new SKB in
      cdc_ncm_fill_tx_frame() where all the data is handled.
      
      For small values of dwNtbOutMaxSize the memory allocated during
      alloc_skb(dwNtbOutMaxSize, GFP_ATOMIC) will have the same size, due to
      how size is aligned at alloc time:
      	size = SKB_DATA_ALIGN(size);
              size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
      Thus we hit the same bug that we tried to squash with
      commit 2be6d4d1 ("net: cdc_ncm: Allow for dwNtbOutMaxSize to be unset or zero")
      
      Low values of dwNtbOutMaxSize do not cause an issue presently because at
      alloc_skb() time more memory (512b) is allocated than required for the
      SKB headers alone (320b), leaving some space (512b - 320b = 192b)
      for CDC data (172b).
      
      However, if more elements (for example 3 x u64 = [24b]) were added to
      one of the SKB header structs, say 'struct skb_shared_info',
      increasing its original size (320b [320b aligned]) to something larger
      (344b [384b aligned]), then suddenly the CDC data (172b) no longer
      fits in the spare SKB data area (512b - 384b = 128b).
      
      Consequently the SKB bounds checking semantics fails and panics:
      
      skbuff: skb_over_panic: text:ffffffff831f755b len:184 put:172 head:ffff88811f1c6c00 data:ffff88811f1c6c00 tail:0xb8 end:0x80 dev:<NULL>
      ------------[ cut here ]------------
      kernel BUG at net/core/skbuff.c:113!
      invalid opcode: 0000 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 57 Comm: kworker/0:2 Not tainted 5.15.106-syzkaller-00249-g19c0ed55a470 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/14/2023
      Workqueue: mld mld_ifc_work
      RIP: 0010:skb_panic net/core/skbuff.c:113 [inline]
      RIP: 0010:skb_over_panic+0x14c/0x150 net/core/skbuff.c:118
      [snip]
      Call Trace:
       <TASK>
       skb_put+0x151/0x210 net/core/skbuff.c:2047
       skb_put_zero include/linux/skbuff.h:2422 [inline]
       cdc_ncm_ndp16 drivers/net/usb/cdc_ncm.c:1131 [inline]
       cdc_ncm_fill_tx_frame+0x11ab/0x3da0 drivers/net/usb/cdc_ncm.c:1308
       cdc_ncm_tx_fixup+0xa3/0x100
      
      Deal with too low values of dwNtbOutMaxSize, clamp it in the range
      [USB_CDC_NCM_NTB_MIN_OUT_SIZE, CDC_NCM_NTB_MAX_SIZE_TX]. We ensure
      enough data space is allocated to handle CDC data by making sure
      dwNtbOutMaxSize is not smaller than USB_CDC_NCM_NTB_MIN_OUT_SIZE.
      
      Fixes: 289507d3 ("net: cdc_ncm: use sysfs for rx/tx aggregation tuning")
      Cc: stable@vger.kernel.org
      Reported-by: syzbot+9f575a1f15fc0c01ed69@syzkaller.appspotmail.com
      Link: https://syzkaller.appspot.com/bug?extid=b982f1059506db48409d
      Link: https://lore.kernel.org/all/20211202143437.1411410-1-lee.jones@linaro.org/Signed-off-by: default avatarTudor Ambarus <tudor.ambarus@linaro.org>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230517133808.1873695-2-tudor.ambarus@linaro.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7e01c7f7
  5. 18 May, 2023 2 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 1f594fe7
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from can, xfrm, bluetooth and netfilter.
      
        Current release - regressions:
      
         - ipv6: fix RCU splat in ipv6_route_seq_show()
      
         - wifi: iwlwifi: disable RFI feature
      
        Previous releases - regressions:
      
         - tcp: fix possible sk_priority leak in tcp_v4_send_reset()
      
         - tipc: do not update mtu if msg_max is too small in mtu negotiation
      
         - netfilter: fix null deref on element insertion
      
         - devlink: change per-devlink netdev notifier to static one
      
         - phylink: fix ksettings_set() ethtool call
      
         - wifi: mac80211: fortify the spinlock against deadlock by interrupt
      
         - wifi: brcmfmac: check for probe() id argument being NULL
      
         - eth: ice:
            - fix undersized tx_flags variable
            - fix ice VF reset during iavf initialization
      
         - eth: hns3: fix sending pfc frames after reset issue
      
        Previous releases - always broken:
      
         - xfrm: release all offloaded policy memory
      
         - nsh: use correct mac_offset to unwind gso skb in nsh_gso_segment()
      
         - vsock: avoid to close connected socket after the timeout
      
         - dsa: rzn1-a5psw: enable management frames for CPU port
      
         - eth: virtio_net: fix error unwinding of XDP initialization
      
         - eth: tun: fix memory leak for detached NAPI queue.
      
        Misc:
      
         - MAINTAINERS: sctp: move Neil to CREDITS"
      
      * tag 'net-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (107 commits)
        MAINTAINERS: skip CCing netdev for Bluetooth patches
        mdio_bus: unhide mdio_bus_init prototype
        bridge: always declare tunnel functions
        atm: hide unused procfs functions
        net: isa: include net/Space.h
        Revert "ARM: dts: stm32: add CAN support on stm32f746"
        netfilter: nft_set_rbtree: fix null deref on element insertion
        netfilter: nf_tables: fix nft_trans type confusion
        netfilter: conntrack: define variables exp_nat_nla_policy and any_addr with CONFIG_NF_NAT
        net: wwan: t7xx: Ensure init is completed before system sleep
        net: selftests: Fix optstring
        net: pcs: xpcs: fix C73 AN not getting enabled
        net: wwan: iosm: fix NULL pointer dereference when removing device
        vlan: fix a potential uninit-value in vlan_dev_hard_start_xmit()
        mailmap: add entries for Nikolay Aleksandrov
        igb: fix bit_shift to be in [1..8] range
        net: dsa: mv88e6xxx: Fix mv88e6393x EPC write command offset
        cassini: Fix a memory leak in the error handling path of cas_init_one()
        tun: Fix memory leak for detached NAPI queue.
        can: kvaser_pciefd: Disable interrupts in probe error path
        ...
      1f594fe7
    • Linus Torvalds's avatar
      Merge tag 'media/v6.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · b802651b
      Linus Torvalds authored
      Pull media fixes from Mauro Carvalho Chehab:
       "Several fixes for the dvb core and drivers:
      
         - fix UAF and null pointer de-reference in DVB core
      
         - fix kernel runtime warning for blocking operation in wait_event*()
           in dvb core
      
         - fix write size bug in DVB conditional access core
      
         - fix dvb demux continuity counter debug check logic
      
         - randconfig build fixes in pvrusb2 and mn88443x
      
         - fix memory leak in ttusb-dec
      
         - fix netup_unidvb probe-time error check logic
      
         - improve error handling in dw2102 if it can't retrieve DVB MAC
           address"
      
      * tag 'media/v6.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
        media: dvb-core: Fix use-after-free due to race condition at dvb_ca_en50221
        media: dvb-core: Fix kernel WARNING for blocking operation in wait_event*()
        media: dvb-core: Fix use-after-free due to race at dvb_register_device()
        media: dvb-core: Fix use-after-free due on race condition at dvb_net
        media: dvb-core: Fix use-after-free on race condition at dvb_frontend
        media: mn88443x: fix !CONFIG_OF error by drop of_match_ptr from ID table
        media: ttusb-dec: fix memory leak in ttusb_dec_exit_dvb()
        media: dvb_ca_en50221: fix a size write bug
        media: netup_unidvb: fix irq init by register it at the end of probe
        media: dvb-usb: dw2102: fix uninit-value in su3000_read_mac_address
        media: dvb-usb: digitv: fix null-ptr-deref in digitv_i2c_xfer()
        media: dvb-usb-v2: rtl28xxu: fix null-ptr-deref in rtl28xxu_i2c_xfer
        media: dvb-usb-v2: ce6230: fix null-ptr-deref in ce6230_i2c_master_xfer()
        media: dvb-usb-v2: ec168: fix null-ptr-deref in ec168_i2c_xfer()
        media: dvb-usb: az6027: fix three null-ptr-deref in az6027_i2c_xfer()
        media: netup_unidvb: fix use-after-free at del_timer()
        media: dvb_demux: fix a bug for the continuity counter
        media: pvrusb2: fix DVB_CORE dependency
      b802651b