1. 19 Jul, 2023 9 commits
    • Jakub Kicinski's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 7f5acea7
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2023-07-17 (iavf)
      
      This series contains updates to iavf driver only.
      
      Ding Hui fixes use-after-free issue by calling netif_napi_del() for all
      allocated q_vectors. He also resolves out-of-bounds issue by not
      updating to new values when timeout is encountered.
      
      Marcin and Ahmed change the way resets are handled so that the callback
      operating under the RTNL lock will wait for the reset to finish, the
      rtnl_lock sensitive functions in reset flow will schedule the netdev update
      for later in order to remove circular dependency with the critical lock.
      
      * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        iavf: fix reset task race with iavf_remove()
        iavf: fix a deadlock caused by rtnl and driver's lock circular dependencies
        Revert "iavf: Do not restart Tx queues after reset task failure"
        Revert "iavf: Detach device during reset task"
        iavf: Wait for reset in callbacks which trigger it
        iavf: use internal state to free traffic IRQs
        iavf: Fix out-of-bounds when setting channels on remove
        iavf: Fix use-after-free in free_netdev
      ====================
      
      Link: https://lore.kernel.org/r/20230717175205.3217774-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7f5acea7
    • Jakub Kicinski's avatar
      Merge branch 'tcp-annotate-data-races-in-tcp_rsk-req' · e9b2bd96
      Jakub Kicinski authored
      Eric Dumazet says:
      
      ====================
      tcp: annotate data-races in tcp_rsk(req)
      
      Small series addressing two syzbot reports around tcp_rsk(req)
      ====================
      
      Link: https://lore.kernel.org/r/20230717144445.653164-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e9b2bd96
    • Eric Dumazet's avatar
      tcp: annotate data-races around tcp_rsk(req)->ts_recent · eba20811
      Eric Dumazet authored
      TCP request sockets are lockless, tcp_rsk(req)->ts_recent
      can change while being read by another cpu as syzbot noticed.
      
      This is harmless, but we should annotate the known races.
      
      Note that tcp_check_req() changes req->ts_recent a bit early,
      we might change this in the future.
      
      BUG: KCSAN: data-race in tcp_check_req / tcp_check_req
      
      write to 0xffff88813c8afb84 of 4 bytes by interrupt on cpu 1:
      tcp_check_req+0x694/0xc70 net/ipv4/tcp_minisocks.c:762
      tcp_v4_rcv+0x12db/0x1b70 net/ipv4/tcp_ipv4.c:2071
      ip_protocol_deliver_rcu+0x356/0x6d0 net/ipv4/ip_input.c:205
      ip_local_deliver_finish+0x13c/0x1a0 net/ipv4/ip_input.c:233
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ip_local_deliver+0xec/0x1c0 net/ipv4/ip_input.c:254
      dst_input include/net/dst.h:468 [inline]
      ip_rcv_finish net/ipv4/ip_input.c:449 [inline]
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ip_rcv+0x197/0x270 net/ipv4/ip_input.c:569
      __netif_receive_skb_one_core net/core/dev.c:5493 [inline]
      __netif_receive_skb+0x90/0x1b0 net/core/dev.c:5607
      process_backlog+0x21f/0x380 net/core/dev.c:5935
      __napi_poll+0x60/0x3b0 net/core/dev.c:6498
      napi_poll net/core/dev.c:6565 [inline]
      net_rx_action+0x32b/0x750 net/core/dev.c:6698
      __do_softirq+0xc1/0x265 kernel/softirq.c:571
      do_softirq+0x7e/0xb0 kernel/softirq.c:472
      __local_bh_enable_ip+0x64/0x70 kernel/softirq.c:396
      local_bh_enable+0x1f/0x20 include/linux/bottom_half.h:33
      rcu_read_unlock_bh include/linux/rcupdate.h:843 [inline]
      __dev_queue_xmit+0xabb/0x1d10 net/core/dev.c:4271
      dev_queue_xmit include/linux/netdevice.h:3088 [inline]
      neigh_hh_output include/net/neighbour.h:528 [inline]
      neigh_output include/net/neighbour.h:542 [inline]
      ip_finish_output2+0x700/0x840 net/ipv4/ip_output.c:229
      ip_finish_output+0xf4/0x240 net/ipv4/ip_output.c:317
      NF_HOOK_COND include/linux/netfilter.h:292 [inline]
      ip_output+0xe5/0x1b0 net/ipv4/ip_output.c:431
      dst_output include/net/dst.h:458 [inline]
      ip_local_out net/ipv4/ip_output.c:126 [inline]
      __ip_queue_xmit+0xa4d/0xa70 net/ipv4/ip_output.c:533
      ip_queue_xmit+0x38/0x40 net/ipv4/ip_output.c:547
      __tcp_transmit_skb+0x1194/0x16e0 net/ipv4/tcp_output.c:1399
      tcp_transmit_skb net/ipv4/tcp_output.c:1417 [inline]
      tcp_write_xmit+0x13ff/0x2fd0 net/ipv4/tcp_output.c:2693
      __tcp_push_pending_frames+0x6a/0x1a0 net/ipv4/tcp_output.c:2877
      tcp_push_pending_frames include/net/tcp.h:1952 [inline]
      __tcp_sock_set_cork net/ipv4/tcp.c:3336 [inline]
      tcp_sock_set_cork+0xe8/0x100 net/ipv4/tcp.c:3343
      rds_tcp_xmit_path_complete+0x3b/0x40 net/rds/tcp_send.c:52
      rds_send_xmit+0xf8d/0x1420 net/rds/send.c:422
      rds_send_worker+0x42/0x1d0 net/rds/threads.c:200
      process_one_work+0x3e6/0x750 kernel/workqueue.c:2408
      worker_thread+0x5f2/0xa10 kernel/workqueue.c:2555
      kthread+0x1d7/0x210 kernel/kthread.c:379
      ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
      
      read to 0xffff88813c8afb84 of 4 bytes by interrupt on cpu 0:
      tcp_check_req+0x32a/0xc70 net/ipv4/tcp_minisocks.c:622
      tcp_v4_rcv+0x12db/0x1b70 net/ipv4/tcp_ipv4.c:2071
      ip_protocol_deliver_rcu+0x356/0x6d0 net/ipv4/ip_input.c:205
      ip_local_deliver_finish+0x13c/0x1a0 net/ipv4/ip_input.c:233
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ip_local_deliver+0xec/0x1c0 net/ipv4/ip_input.c:254
      dst_input include/net/dst.h:468 [inline]
      ip_rcv_finish net/ipv4/ip_input.c:449 [inline]
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ip_rcv+0x197/0x270 net/ipv4/ip_input.c:569
      __netif_receive_skb_one_core net/core/dev.c:5493 [inline]
      __netif_receive_skb+0x90/0x1b0 net/core/dev.c:5607
      process_backlog+0x21f/0x380 net/core/dev.c:5935
      __napi_poll+0x60/0x3b0 net/core/dev.c:6498
      napi_poll net/core/dev.c:6565 [inline]
      net_rx_action+0x32b/0x750 net/core/dev.c:6698
      __do_softirq+0xc1/0x265 kernel/softirq.c:571
      run_ksoftirqd+0x17/0x20 kernel/softirq.c:939
      smpboot_thread_fn+0x30a/0x4a0 kernel/smpboot.c:164
      kthread+0x1d7/0x210 kernel/kthread.c:379
      ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
      
      value changed: 0x1cd237f1 -> 0x1cd237f2
      
      Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20230717144445.653164-3-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      eba20811
    • Eric Dumazet's avatar
      tcp: annotate data-races around tcp_rsk(req)->txhash · 5e526552
      Eric Dumazet authored
      TCP request sockets are lockless, some of their fields
      can change while being read by another cpu as syzbot noticed.
      
      This is usually harmless, but we should annotate the known
      races.
      
      This patch takes care of tcp_rsk(req)->txhash,
      a separate one is needed for tcp_rsk(req)->ts_recent.
      
      BUG: KCSAN: data-race in tcp_make_synack / tcp_rtx_synack
      
      write to 0xffff8881362304bc of 4 bytes by task 32083 on cpu 1:
      tcp_rtx_synack+0x9d/0x2a0 net/ipv4/tcp_output.c:4213
      inet_rtx_syn_ack+0x38/0x80 net/ipv4/inet_connection_sock.c:880
      tcp_check_req+0x379/0xc70 net/ipv4/tcp_minisocks.c:665
      tcp_v6_rcv+0x125b/0x1b20 net/ipv6/tcp_ipv6.c:1673
      ip6_protocol_deliver_rcu+0x92f/0xf30 net/ipv6/ip6_input.c:437
      ip6_input_finish net/ipv6/ip6_input.c:482 [inline]
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ip6_input+0xbd/0x1b0 net/ipv6/ip6_input.c:491
      dst_input include/net/dst.h:468 [inline]
      ip6_rcv_finish+0x1e2/0x2e0 net/ipv6/ip6_input.c:79
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ipv6_rcv+0x74/0x150 net/ipv6/ip6_input.c:309
      __netif_receive_skb_one_core net/core/dev.c:5452 [inline]
      __netif_receive_skb+0x90/0x1b0 net/core/dev.c:5566
      netif_receive_skb_internal net/core/dev.c:5652 [inline]
      netif_receive_skb+0x4a/0x310 net/core/dev.c:5711
      tun_rx_batched+0x3bf/0x400
      tun_get_user+0x1d24/0x22b0 drivers/net/tun.c:1997
      tun_chr_write_iter+0x18e/0x240 drivers/net/tun.c:2043
      call_write_iter include/linux/fs.h:1871 [inline]
      new_sync_write fs/read_write.c:491 [inline]
      vfs_write+0x4ab/0x7d0 fs/read_write.c:584
      ksys_write+0xeb/0x1a0 fs/read_write.c:637
      __do_sys_write fs/read_write.c:649 [inline]
      __se_sys_write fs/read_write.c:646 [inline]
      __x64_sys_write+0x42/0x50 fs/read_write.c:646
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      read to 0xffff8881362304bc of 4 bytes by task 32078 on cpu 0:
      tcp_make_synack+0x367/0xb40 net/ipv4/tcp_output.c:3663
      tcp_v6_send_synack+0x72/0x420 net/ipv6/tcp_ipv6.c:544
      tcp_conn_request+0x11a8/0x1560 net/ipv4/tcp_input.c:7059
      tcp_v6_conn_request+0x13f/0x180 net/ipv6/tcp_ipv6.c:1175
      tcp_rcv_state_process+0x156/0x1de0 net/ipv4/tcp_input.c:6494
      tcp_v6_do_rcv+0x98a/0xb70 net/ipv6/tcp_ipv6.c:1509
      tcp_v6_rcv+0x17b8/0x1b20 net/ipv6/tcp_ipv6.c:1735
      ip6_protocol_deliver_rcu+0x92f/0xf30 net/ipv6/ip6_input.c:437
      ip6_input_finish net/ipv6/ip6_input.c:482 [inline]
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ip6_input+0xbd/0x1b0 net/ipv6/ip6_input.c:491
      dst_input include/net/dst.h:468 [inline]
      ip6_rcv_finish+0x1e2/0x2e0 net/ipv6/ip6_input.c:79
      NF_HOOK include/linux/netfilter.h:303 [inline]
      ipv6_rcv+0x74/0x150 net/ipv6/ip6_input.c:309
      __netif_receive_skb_one_core net/core/dev.c:5452 [inline]
      __netif_receive_skb+0x90/0x1b0 net/core/dev.c:5566
      netif_receive_skb_internal net/core/dev.c:5652 [inline]
      netif_receive_skb+0x4a/0x310 net/core/dev.c:5711
      tun_rx_batched+0x3bf/0x400
      tun_get_user+0x1d24/0x22b0 drivers/net/tun.c:1997
      tun_chr_write_iter+0x18e/0x240 drivers/net/tun.c:2043
      call_write_iter include/linux/fs.h:1871 [inline]
      new_sync_write fs/read_write.c:491 [inline]
      vfs_write+0x4ab/0x7d0 fs/read_write.c:584
      ksys_write+0xeb/0x1a0 fs/read_write.c:637
      __do_sys_write fs/read_write.c:649 [inline]
      __se_sys_write fs/read_write.c:646 [inline]
      __x64_sys_write+0x42/0x50 fs/read_write.c:646
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      value changed: 0x91d25731 -> 0xe79325cd
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 32078 Comm: syz-executor.4 Not tainted 6.5.0-rc1-syzkaller-00033-geb26cbb1 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/03/2023
      
      Fixes: 58d607d3 ("tcp: provide skb->hash to synack packets")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20230717144445.653164-2-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5e526552
    • Subbaraya Sundeep's avatar
      octeontx2-pf: mcs: Generate hash key using ecb(aes) · e7002b3b
      Subbaraya Sundeep authored
      Hardware generated encryption and ICV tags are found to
      be wrong when tested with IEEE MACSEC test vectors.
      This is because as per the HRM, the hash key (derived by
      AES-ECB block encryption of an all 0s block with the SAK)
      has to be programmed by the software in
      MCSX_RS_MCS_CPM_TX_SLAVE_SA_PLCY_MEM_4X register.
      Hence fix this by generating hash key in software and
      configuring in hardware.
      
      Fixes: c54ffc73 ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading")
      Signed-off-by: default avatarSubbaraya Sundeep <sbhatta@marvell.com>
      Reviewed-by: default avatarKalesh AP <kalesh-anakkur.purayil@broadcom.com>
      Link: https://lore.kernel.org/r/1689574603-28093-1-git-send-email-sbhatta@marvell.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e7002b3b
    • Florian Kauer's avatar
      igc: Prevent garbled TX queue with XDP ZEROCOPY · 78adb4bc
      Florian Kauer authored
      In normal operation, each populated queue item has
      next_to_watch pointing to the last TX desc of the packet,
      while each cleaned item has it set to 0. In particular,
      next_to_use that points to the next (necessarily clean)
      item to use has next_to_watch set to 0.
      
      When the TX queue is used both by an application using
      AF_XDP with ZEROCOPY as well as a second non-XDP application
      generating high traffic, the queue pointers can get in
      an invalid state where next_to_use points to an item
      where next_to_watch is NOT set to 0.
      
      However, the implementation assumes at several places
      that this is never the case, so if it does hold,
      bad things happen. In particular, within the loop inside
      of igc_clean_tx_irq(), next_to_clean can overtake next_to_use.
      Finally, this prevents any further transmission via
      this queue and it never gets unblocked or signaled.
      Secondly, if the queue is in this garbled state,
      the inner loop of igc_clean_tx_ring() will never terminate,
      completely hogging a CPU core.
      
      The reason is that igc_xdp_xmit_zc() reads next_to_use
      before acquiring the lock, and writing it back
      (potentially unmodified) later. If it got modified
      before locking, the outdated next_to_use is written
      pointing to an item that was already used elsewhere
      (and thus next_to_watch got written).
      
      Fixes: 9acf59a7 ("igc: Enable TX via AF_XDP zero-copy")
      Signed-off-by: default avatarFlorian Kauer <florian.kauer@linutronix.de>
      Reviewed-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Tested-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Acked-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Link: https://lore.kernel.org/r/20230717175444.3217831-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      78adb4bc
    • Jakub Kicinski's avatar
      Merge tag 'linux-can-fixes-for-6.5-20230717' of... · 936fd2c5
      Jakub Kicinski authored
      Merge tag 'linux-can-fixes-for-6.5-20230717' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can 2023-07-17
      
      The 1st patch is by Ziyang Xuan and fixes a possible memory leak in
      the receiver handling in the CAN RAW protocol.
      
      YueHaibing contributes a use after free in bcm_proc_show() of the
      Broad Cast Manager (BCM) CAN protocol.
      
      The next 2 patches are by me and fix a possible null pointer
      dereference in the RX path of the gs_usb driver with activated
      hardware timestamps and the candlelight firmware.
      
      The last patch is by Fedor Ross, Marek Vasut and me and targets the
      mcp251xfd driver. The polling timeout of __mcp251xfd_chip_set_mode()
      is increased to fix bus joining on busy CAN buses and very low bit
      rate.
      
      * tag 'linux-can-fixes-for-6.5-20230717' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
        can: mcp251xfd: __mcp251xfd_chip_set_mode(): increase poll timeout
        can: gs_usb: fix time stamp counter initialization
        can: gs_usb: gs_can_open(): improve error handling
        can: bcm: Fix UAF in bcm_proc_show()
        can: raw: fix receiver memory leak
      ====================
      
      Link: https://lore.kernel.org/r/20230717180938.230816-1-mkl@pengutronix.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      936fd2c5
    • John Fastabend's avatar
      mailmap: Add entry for old intel email · 195e903b
      John Fastabend authored
      Fix old email to avoid bouncing email from net/drivers and older
      netdev work. Anyways my @intel email hasn't been active for years.
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/r/20230717173306.38407-1-john.fastabend@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      195e903b
    • Shannon Nelson's avatar
      mailmap: add entries for past lives · d1998e50
      Shannon Nelson authored
      Update old emails for my current work email.
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Link: https://lore.kernel.org/r/20230717193242.43670-1-shannon.nelson@amd.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d1998e50
  2. 18 Jul, 2023 7 commits
  3. 17 Jul, 2023 22 commits
    • Fedor Ross's avatar
      can: mcp251xfd: __mcp251xfd_chip_set_mode(): increase poll timeout · 9efa1a54
      Fedor Ross authored
      The mcp251xfd controller needs an idle bus to enter 'Normal CAN 2.0
      mode' or . The maximum length of a CAN frame is 736 bits (64 data
      bytes, CAN-FD, EFF mode, worst case bit stuffing and interframe
      spacing). For low bit rates like 10 kbit/s the arbitrarily chosen
      MCP251XFD_POLL_TIMEOUT_US of 1 ms is too small.
      
      Otherwise during polling for the CAN controller to enter 'Normal CAN
      2.0 mode' the timeout limit is exceeded and the configuration fails
      with:
      
      | $ ip link set dev can1 up type can bitrate 10000
      | [  731.911072] mcp251xfd spi2.1 can1: Controller failed to enter mode CAN 2.0 Mode (6) and stays in Configuration Mode (4) (con=0x068b0760, osc=0x00000468).
      | [  731.927192] mcp251xfd spi2.1 can1: CRC read error at address 0x0e0c (length=4, data=00 00 00 00, CRC=0x0000) retrying.
      | [  731.938101] A link change request failed with some changes committed already. Interface can1 may have been left with an inconsistent configuration, please check.
      | RTNETLINK answers: Connection timed out
      
      Make MCP251XFD_POLL_TIMEOUT_US timeout calculation dynamic. Use
      maximum of 1ms and bit time of 1 full 64 data bytes CAN-FD frame in
      EFF mode, worst case bit stuffing and interframe spacing at the
      current bit rate.
      
      For easier backporting define the macro MCP251XFD_FRAME_LEN_MAX_BITS
      that holds the max frame length in bits, which is 736. This can be
      replaced by can_frame_bits(true, true, true, true, CANFD_MAX_DLEN) in
      a cleanup patch later.
      
      Fixes: 55e5b97f ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN")
      Signed-off-by: default avatarFedor Ross <fedor.ross@ifm.com>
      Signed-off-by: default avatarMarek Vasut <marex@denx.de>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/all/20230717-mcp251xfd-fix-increase-poll-timeout-v5-1-06600f34c684@pengutronix.deSigned-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      9efa1a54
    • Ahmed Zaki's avatar
      iavf: fix reset task race with iavf_remove() · c34743da
      Ahmed Zaki authored
      The reset task is currently scheduled from the watchdog or adminq tasks.
      First, all direct calls to schedule the reset task are replaced with the
      iavf_schedule_reset(), which is modified to accept the flag showing the
      type of reset.
      
      To prevent the reset task from starting once iavf_remove() starts, we need
      to check the __IAVF_IN_REMOVE_TASK bit before we schedule it. This is now
      easily added to iavf_schedule_reset().
      
      Finally, remove the check for IAVF_FLAG_RESET_NEEDED in the watchdog task.
      It is redundant since all callers who set the flag immediately schedules
      the reset task.
      
      Fixes: 3ccd54ef ("iavf: Fix init state closure on remove")
      Fixes: 14756b2a ("iavf: Fix __IAVF_RESETTING state usage")
      Signed-off-by: default avatarAhmed Zaki <ahmed.zaki@intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      c34743da
    • Ahmed Zaki's avatar
      iavf: fix a deadlock caused by rtnl and driver's lock circular dependencies · d1639a17
      Ahmed Zaki authored
      A driver's lock (crit_lock) is used to serialize all the driver's tasks.
      Lockdep, however, shows a circular dependency between rtnl and
      crit_lock. This happens when an ndo that already holds the rtnl requests
      the driver to reset, since the reset task (in some paths) tries to grab
      rtnl to either change real number of queues of update netdev features.
      
        [566.241851] ======================================================
        [566.241893] WARNING: possible circular locking dependency detected
        [566.241936] 6.2.14-100.fc36.x86_64+debug #1 Tainted: G           OE
        [566.241984] ------------------------------------------------------
        [566.242025] repro.sh/2604 is trying to acquire lock:
        [566.242061] ffff9280fc5ceee8 (&adapter->crit_lock){+.+.}-{3:3}, at: iavf_close+0x3c/0x240 [iavf]
        [566.242167]
                     but task is already holding lock:
        [566.242209] ffffffff9976d350 (rtnl_mutex){+.+.}-{3:3}, at: iavf_remove+0x6b5/0x730 [iavf]
        [566.242300]
                     which lock already depends on the new lock.
      
        [566.242353]
                     the existing dependency chain (in reverse order) is:
        [566.242401]
                     -> #1 (rtnl_mutex){+.+.}-{3:3}:
        [566.242451]        __mutex_lock+0xc1/0xbb0
        [566.242489]        iavf_init_interrupt_scheme+0x179/0x440 [iavf]
        [566.242560]        iavf_watchdog_task+0x80b/0x1400 [iavf]
        [566.242627]        process_one_work+0x2b3/0x560
        [566.242663]        worker_thread+0x4f/0x3a0
        [566.242696]        kthread+0xf2/0x120
        [566.242730]        ret_from_fork+0x29/0x50
        [566.242763]
                     -> #0 (&adapter->crit_lock){+.+.}-{3:3}:
        [566.242815]        __lock_acquire+0x15ff/0x22b0
        [566.242869]        lock_acquire+0xd2/0x2c0
        [566.242901]        __mutex_lock+0xc1/0xbb0
        [566.242934]        iavf_close+0x3c/0x240 [iavf]
        [566.242997]        __dev_close_many+0xac/0x120
        [566.243036]        dev_close_many+0x8b/0x140
        [566.243071]        unregister_netdevice_many_notify+0x165/0x7c0
        [566.243116]        unregister_netdevice_queue+0xd3/0x110
        [566.243157]        iavf_remove+0x6c1/0x730 [iavf]
        [566.243217]        pci_device_remove+0x33/0xa0
        [566.243257]        device_release_driver_internal+0x1bc/0x240
        [566.243299]        pci_stop_bus_device+0x6c/0x90
        [566.243338]        pci_stop_and_remove_bus_device+0xe/0x20
        [566.243380]        pci_iov_remove_virtfn+0xd1/0x130
        [566.243417]        sriov_disable+0x34/0xe0
        [566.243448]        ice_free_vfs+0x2da/0x330 [ice]
        [566.244383]        ice_sriov_configure+0x88/0xad0 [ice]
        [566.245353]        sriov_numvfs_store+0xde/0x1d0
        [566.246156]        kernfs_fop_write_iter+0x15e/0x210
        [566.246921]        vfs_write+0x288/0x530
        [566.247671]        ksys_write+0x74/0xf0
        [566.248408]        do_syscall_64+0x58/0x80
        [566.249145]        entry_SYSCALL_64_after_hwframe+0x72/0xdc
        [566.249886]
                       other info that might help us debug this:
      
        [566.252014]  Possible unsafe locking scenario:
      
        [566.253432]        CPU0                    CPU1
        [566.254118]        ----                    ----
        [566.254800]   lock(rtnl_mutex);
        [566.255514]                                lock(&adapter->crit_lock);
        [566.256233]                                lock(rtnl_mutex);
        [566.256897]   lock(&adapter->crit_lock);
        [566.257388]
                        *** DEADLOCK ***
      
      The deadlock can be triggered by a script that is continuously resetting
      the VF adapter while doing other operations requiring RTNL, e.g:
      
      	while :; do
      		ip link set $VF up
      		ethtool --set-channels $VF combined 2
      		ip link set $VF down
      		ip link set $VF up
      		ethtool --set-channels $VF combined 4
      		ip link set $VF down
      	done
      
      Any operation that triggers a reset can substitute "ethtool --set-channles"
      
      As a fix, add a new task "finish_config" that do all the work which
      needs rtnl lock. With the exception of iavf_remove(), all work that
      require rtnl should be called from this task.
      
      As for iavf_remove(), at the point where we need to call
      unregister_netdevice() (and grab rtnl_lock), we make sure the finish_config
      task is not running (cancel_work_sync()) to safely grab rtnl. Subsequent
      finish_config work cannot restart after that since the task is guarded
      by the __IAVF_IN_REMOVE_TASK bit in iavf_schedule_finish_config().
      
      Fixes: 5ac49f3c ("iavf: use mutexes for locking of critical sections")
      Signed-off-by: default avatarAhmed Zaki <ahmed.zaki@intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      d1639a17
    • Marcin Szycik's avatar
      Revert "iavf: Do not restart Tx queues after reset task failure" · d916d273
      Marcin Szycik authored
      This reverts commit 08f1c147.
      
      Netdev is no longer being detached during reset, so this fix can be
      reverted. We leave the removal of "hacky" IFF_UP flag update.
      Signed-off-by: default avatarMarcin Szycik <marcin.szycik@linux.intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      d916d273
    • Marcin Szycik's avatar
      Revert "iavf: Detach device during reset task" · d2806d96
      Marcin Szycik authored
      This reverts commit aa626da9.
      
      Detaching device during reset was not fully fixing the rtnl locking issue,
      as there could be a situation where callback was already in progress before
      detaching netdev.
      
      Furthermore, detaching netdevice causes TX timeouts if traffic is running.
      To reproduce:
      
      ip netns exec ns1 iperf3 -c $PEER_IP -t 600 --logfile /dev/null &
      while :; do
              for i in 200 7000 400 5000 300 3000 ; do
      		ip netns exec ns1 ip link set $VF1 mtu $i
                      sleep 2
              done
              sleep 10
      done
      
      Currently, callbacks such as iavf_change_mtu() wait for the reset.
      If the reset fails to acquire the rtnl_lock, they schedule the netdev
      update for later while continuing the reset flow. Operations like MTU
      changes are performed under the rtnl_lock. Therefore, when the operation
      finishes, another callback that uses rtnl_lock can start.
      Signed-off-by: default avatarDawid Wesierski <dawidx.wesierski@intel.com>
      Signed-off-by: default avatarMarcin Szycik <marcin.szycik@linux.intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      d2806d96
    • Marcin Szycik's avatar
      iavf: Wait for reset in callbacks which trigger it · c2ed2403
      Marcin Szycik authored
      There was a fail when trying to add the interface to bonding
      right after changing the MTU on the interface. It was caused
      by bonding interface unable to open the interface due to
      interface being in __RESETTING state because of MTU change.
      
      Add new reset_waitqueue to indicate that reset has finished.
      
      Add waiting for reset to finish in callbacks which trigger hw reset:
      iavf_set_priv_flags(), iavf_change_mtu() and iavf_set_ringparam().
      We use a 5000ms timeout period because on Hyper-V based systems,
      this operation takes around 3000-4000ms. In normal circumstances,
      it doesn't take more than 500ms to complete.
      
      Add a function iavf_wait_for_reset() to reuse waiting for reset code and
      use it also in iavf_set_channels(), which already waits for reset.
      We don't use error handling in iavf_set_channels() as this could
      cause the device to be in incorrect state if the reset was scheduled
      but hit timeout or the waitng function was interrupted by a signal.
      
      Fixes: 4e5e6b5d ("iavf: Fix return of set the new channel count")
      Signed-off-by: default avatarMarcin Szycik <marcin.szycik@linux.intel.com>
      Co-developed-by: default avatarDawid Wesierski <dawidx.wesierski@intel.com>
      Signed-off-by: default avatarDawid Wesierski <dawidx.wesierski@intel.com>
      Signed-off-by: default avatarSylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
      Signed-off-by: default avatarKamil Maziarz <kamil.maziarz@intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      c2ed2403
    • Ahmed Zaki's avatar
      iavf: use internal state to free traffic IRQs · a77ed5c5
      Ahmed Zaki authored
      If the system tries to close the netdev while iavf_reset_task() is
      running, __LINK_STATE_START will be cleared and netif_running() will
      return false in iavf_reinit_interrupt_scheme(). This will result in
      iavf_free_traffic_irqs() not being called and a leak as follows:
      
          [7632.489326] remove_proc_entry: removing non-empty directory 'irq/999', leaking at least 'iavf-enp24s0f0v0-TxRx-0'
          [7632.490214] WARNING: CPU: 0 PID: 10 at fs/proc/generic.c:718 remove_proc_entry+0x19b/0x1b0
      
      is shown when pci_disable_msix() is later called. Fix by using the
      internal adapter state. The traffic IRQs will always exist if
      state == __IAVF_RUNNING.
      
      Fixes: 5b36e8d0 ("i40evf: Enable VF to request an alternate queue allocation")
      Signed-off-by: default avatarAhmed Zaki <ahmed.zaki@intel.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      a77ed5c5
    • Ding Hui's avatar
      iavf: Fix out-of-bounds when setting channels on remove · 7c4bced3
      Ding Hui authored
      If we set channels greater during iavf_remove(), and waiting reset done
      would be timeout, then returned with error but changed num_active_queues
      directly, that will lead to OOB like the following logs. Because the
      num_active_queues is greater than tx/rx_rings[] allocated actually.
      
      Reproducer:
      
        [root@host ~]# cat repro.sh
        #!/bin/bash
      
        pf_dbsf="0000:41:00.0"
        vf0_dbsf="0000:41:02.0"
        g_pids=()
      
        function do_set_numvf()
        {
            echo 2 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs
            sleep $((RANDOM%3+1))
            echo 0 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs
            sleep $((RANDOM%3+1))
        }
      
        function do_set_channel()
        {
            local nic=$(ls -1 --indicator-style=none /sys/bus/pci/devices/${vf0_dbsf}/net/)
            [ -z "$nic" ] && { sleep $((RANDOM%3)) ; return 1; }
            ifconfig $nic 192.168.18.5 netmask 255.255.255.0
            ifconfig $nic up
            ethtool -L $nic combined 1
            ethtool -L $nic combined 4
            sleep $((RANDOM%3))
        }
      
        function on_exit()
        {
            local pid
            for pid in "${g_pids[@]}"; do
                kill -0 "$pid" &>/dev/null && kill "$pid" &>/dev/null
            done
            g_pids=()
        }
      
        trap "on_exit; exit" EXIT
      
        while :; do do_set_numvf ; done &
        g_pids+=($!)
        while :; do do_set_channel ; done &
        g_pids+=($!)
      
        wait
      
      Result:
      
      [ 3506.152887] iavf 0000:41:02.0: Removing device
      [ 3510.400799] ==================================================================
      [ 3510.400820] BUG: KASAN: slab-out-of-bounds in iavf_free_all_tx_resources+0x156/0x160 [iavf]
      [ 3510.400823] Read of size 8 at addr ffff88b6f9311008 by task repro.sh/55536
      [ 3510.400823]
      [ 3510.400830] CPU: 101 PID: 55536 Comm: repro.sh Kdump: loaded Tainted: G           O     --------- -t - 4.18.0 #1
      [ 3510.400832] Hardware name: Powerleader PR2008AL/H12DSi-N6, BIOS 2.0 04/09/2021
      [ 3510.400835] Call Trace:
      [ 3510.400851]  dump_stack+0x71/0xab
      [ 3510.400860]  print_address_description+0x6b/0x290
      [ 3510.400865]  ? iavf_free_all_tx_resources+0x156/0x160 [iavf]
      [ 3510.400868]  kasan_report+0x14a/0x2b0
      [ 3510.400873]  iavf_free_all_tx_resources+0x156/0x160 [iavf]
      [ 3510.400880]  iavf_remove+0x2b6/0xc70 [iavf]
      [ 3510.400884]  ? iavf_free_all_rx_resources+0x160/0x160 [iavf]
      [ 3510.400891]  ? wait_woken+0x1d0/0x1d0
      [ 3510.400895]  ? notifier_call_chain+0xc1/0x130
      [ 3510.400903]  pci_device_remove+0xa8/0x1f0
      [ 3510.400910]  device_release_driver_internal+0x1c6/0x460
      [ 3510.400916]  pci_stop_bus_device+0x101/0x150
      [ 3510.400919]  pci_stop_and_remove_bus_device+0xe/0x20
      [ 3510.400924]  pci_iov_remove_virtfn+0x187/0x420
      [ 3510.400927]  ? pci_iov_add_virtfn+0xe10/0xe10
      [ 3510.400929]  ? pci_get_subsys+0x90/0x90
      [ 3510.400932]  sriov_disable+0xed/0x3e0
      [ 3510.400936]  ? bus_find_device+0x12d/0x1a0
      [ 3510.400953]  i40e_free_vfs+0x754/0x1210 [i40e]
      [ 3510.400966]  ? i40e_reset_all_vfs+0x880/0x880 [i40e]
      [ 3510.400968]  ? pci_get_device+0x7c/0x90
      [ 3510.400970]  ? pci_get_subsys+0x90/0x90
      [ 3510.400982]  ? pci_vfs_assigned.part.7+0x144/0x210
      [ 3510.400987]  ? __mutex_lock_slowpath+0x10/0x10
      [ 3510.400996]  i40e_pci_sriov_configure+0x1fa/0x2e0 [i40e]
      [ 3510.401001]  sriov_numvfs_store+0x214/0x290
      [ 3510.401005]  ? sriov_totalvfs_show+0x30/0x30
      [ 3510.401007]  ? __mutex_lock_slowpath+0x10/0x10
      [ 3510.401011]  ? __check_object_size+0x15a/0x350
      [ 3510.401018]  kernfs_fop_write+0x280/0x3f0
      [ 3510.401022]  vfs_write+0x145/0x440
      [ 3510.401025]  ksys_write+0xab/0x160
      [ 3510.401028]  ? __ia32_sys_read+0xb0/0xb0
      [ 3510.401031]  ? fput_many+0x1a/0x120
      [ 3510.401032]  ? filp_close+0xf0/0x130
      [ 3510.401038]  do_syscall_64+0xa0/0x370
      [ 3510.401041]  ? page_fault+0x8/0x30
      [ 3510.401043]  entry_SYSCALL_64_after_hwframe+0x65/0xca
      [ 3510.401073] RIP: 0033:0x7f3a9bb842c0
      [ 3510.401079] Code: 73 01 c3 48 8b 0d d8 cb 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 89 24 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 fe dd 01 00 48 89 04 24
      [ 3510.401080] RSP: 002b:00007ffc05f1fe18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [ 3510.401083] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f3a9bb842c0
      [ 3510.401085] RDX: 0000000000000002 RSI: 0000000002327408 RDI: 0000000000000001
      [ 3510.401086] RBP: 0000000002327408 R08: 00007f3a9be53780 R09: 00007f3a9c8a4700
      [ 3510.401086] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002
      [ 3510.401087] R13: 0000000000000001 R14: 00007f3a9be52620 R15: 0000000000000001
      [ 3510.401090]
      [ 3510.401093] Allocated by task 76795:
      [ 3510.401098]  kasan_kmalloc+0xa6/0xd0
      [ 3510.401099]  __kmalloc+0xfb/0x200
      [ 3510.401104]  iavf_init_interrupt_scheme+0x26f/0x1310 [iavf]
      [ 3510.401108]  iavf_watchdog_task+0x1d58/0x4050 [iavf]
      [ 3510.401114]  process_one_work+0x56a/0x11f0
      [ 3510.401115]  worker_thread+0x8f/0xf40
      [ 3510.401117]  kthread+0x2a0/0x390
      [ 3510.401119]  ret_from_fork+0x1f/0x40
      [ 3510.401122]  0xffffffffffffffff
      [ 3510.401123]
      
      In timeout handling, we should keep the original num_active_queues
      and reset num_req_queues to 0.
      
      Fixes: 4e5e6b5d ("iavf: Fix return of set the new channel count")
      Signed-off-by: default avatarDing Hui <dinghui@sangfor.com.cn>
      Cc: Donglin Peng <pengdonglin@sangfor.com.cn>
      Cc: Huang Cun <huangcun@sangfor.com.cn>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      7c4bced3
    • Ding Hui's avatar
      iavf: Fix use-after-free in free_netdev · 5f4fa167
      Ding Hui authored
      We do netif_napi_add() for all allocated q_vectors[], but potentially
      do netif_napi_del() for part of them, then kfree q_vectors and leave
      invalid pointers at dev->napi_list.
      
      Reproducer:
      
        [root@host ~]# cat repro.sh
        #!/bin/bash
      
        pf_dbsf="0000:41:00.0"
        vf0_dbsf="0000:41:02.0"
        g_pids=()
      
        function do_set_numvf()
        {
            echo 2 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs
            sleep $((RANDOM%3+1))
            echo 0 >/sys/bus/pci/devices/${pf_dbsf}/sriov_numvfs
            sleep $((RANDOM%3+1))
        }
      
        function do_set_channel()
        {
            local nic=$(ls -1 --indicator-style=none /sys/bus/pci/devices/${vf0_dbsf}/net/)
            [ -z "$nic" ] && { sleep $((RANDOM%3)) ; return 1; }
            ifconfig $nic 192.168.18.5 netmask 255.255.255.0
            ifconfig $nic up
            ethtool -L $nic combined 1
            ethtool -L $nic combined 4
            sleep $((RANDOM%3))
        }
      
        function on_exit()
        {
            local pid
            for pid in "${g_pids[@]}"; do
                kill -0 "$pid" &>/dev/null && kill "$pid" &>/dev/null
            done
            g_pids=()
        }
      
        trap "on_exit; exit" EXIT
      
        while :; do do_set_numvf ; done &
        g_pids+=($!)
        while :; do do_set_channel ; done &
        g_pids+=($!)
      
        wait
      
      Result:
      
      [ 4093.900222] ==================================================================
      [ 4093.900230] BUG: KASAN: use-after-free in free_netdev+0x308/0x390
      [ 4093.900232] Read of size 8 at addr ffff88b4dc145640 by task repro.sh/6699
      [ 4093.900233]
      [ 4093.900236] CPU: 10 PID: 6699 Comm: repro.sh Kdump: loaded Tainted: G           O     --------- -t - 4.18.0 #1
      [ 4093.900238] Hardware name: Powerleader PR2008AL/H12DSi-N6, BIOS 2.0 04/09/2021
      [ 4093.900239] Call Trace:
      [ 4093.900244]  dump_stack+0x71/0xab
      [ 4093.900249]  print_address_description+0x6b/0x290
      [ 4093.900251]  ? free_netdev+0x308/0x390
      [ 4093.900252]  kasan_report+0x14a/0x2b0
      [ 4093.900254]  free_netdev+0x308/0x390
      [ 4093.900261]  iavf_remove+0x825/0xd20 [iavf]
      [ 4093.900265]  pci_device_remove+0xa8/0x1f0
      [ 4093.900268]  device_release_driver_internal+0x1c6/0x460
      [ 4093.900271]  pci_stop_bus_device+0x101/0x150
      [ 4093.900273]  pci_stop_and_remove_bus_device+0xe/0x20
      [ 4093.900275]  pci_iov_remove_virtfn+0x187/0x420
      [ 4093.900277]  ? pci_iov_add_virtfn+0xe10/0xe10
      [ 4093.900278]  ? pci_get_subsys+0x90/0x90
      [ 4093.900280]  sriov_disable+0xed/0x3e0
      [ 4093.900282]  ? bus_find_device+0x12d/0x1a0
      [ 4093.900290]  i40e_free_vfs+0x754/0x1210 [i40e]
      [ 4093.900298]  ? i40e_reset_all_vfs+0x880/0x880 [i40e]
      [ 4093.900299]  ? pci_get_device+0x7c/0x90
      [ 4093.900300]  ? pci_get_subsys+0x90/0x90
      [ 4093.900306]  ? pci_vfs_assigned.part.7+0x144/0x210
      [ 4093.900309]  ? __mutex_lock_slowpath+0x10/0x10
      [ 4093.900315]  i40e_pci_sriov_configure+0x1fa/0x2e0 [i40e]
      [ 4093.900318]  sriov_numvfs_store+0x214/0x290
      [ 4093.900320]  ? sriov_totalvfs_show+0x30/0x30
      [ 4093.900321]  ? __mutex_lock_slowpath+0x10/0x10
      [ 4093.900323]  ? __check_object_size+0x15a/0x350
      [ 4093.900326]  kernfs_fop_write+0x280/0x3f0
      [ 4093.900329]  vfs_write+0x145/0x440
      [ 4093.900330]  ksys_write+0xab/0x160
      [ 4093.900332]  ? __ia32_sys_read+0xb0/0xb0
      [ 4093.900334]  ? fput_many+0x1a/0x120
      [ 4093.900335]  ? filp_close+0xf0/0x130
      [ 4093.900338]  do_syscall_64+0xa0/0x370
      [ 4093.900339]  ? page_fault+0x8/0x30
      [ 4093.900341]  entry_SYSCALL_64_after_hwframe+0x65/0xca
      [ 4093.900357] RIP: 0033:0x7f16ad4d22c0
      [ 4093.900359] Code: 73 01 c3 48 8b 0d d8 cb 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 89 24 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 fe dd 01 00 48 89 04 24
      [ 4093.900360] RSP: 002b:00007ffd6491b7f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [ 4093.900362] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f16ad4d22c0
      [ 4093.900363] RDX: 0000000000000002 RSI: 0000000001a41408 RDI: 0000000000000001
      [ 4093.900364] RBP: 0000000001a41408 R08: 00007f16ad7a1780 R09: 00007f16ae1f2700
      [ 4093.900364] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002
      [ 4093.900365] R13: 0000000000000001 R14: 00007f16ad7a0620 R15: 0000000000000001
      [ 4093.900367]
      [ 4093.900368] Allocated by task 820:
      [ 4093.900371]  kasan_kmalloc+0xa6/0xd0
      [ 4093.900373]  __kmalloc+0xfb/0x200
      [ 4093.900376]  iavf_init_interrupt_scheme+0x63b/0x1320 [iavf]
      [ 4093.900380]  iavf_watchdog_task+0x3d51/0x52c0 [iavf]
      [ 4093.900382]  process_one_work+0x56a/0x11f0
      [ 4093.900383]  worker_thread+0x8f/0xf40
      [ 4093.900384]  kthread+0x2a0/0x390
      [ 4093.900385]  ret_from_fork+0x1f/0x40
      [ 4093.900387]  0xffffffffffffffff
      [ 4093.900387]
      [ 4093.900388] Freed by task 6699:
      [ 4093.900390]  __kasan_slab_free+0x137/0x190
      [ 4093.900391]  kfree+0x8b/0x1b0
      [ 4093.900394]  iavf_free_q_vectors+0x11d/0x1a0 [iavf]
      [ 4093.900397]  iavf_remove+0x35a/0xd20 [iavf]
      [ 4093.900399]  pci_device_remove+0xa8/0x1f0
      [ 4093.900400]  device_release_driver_internal+0x1c6/0x460
      [ 4093.900401]  pci_stop_bus_device+0x101/0x150
      [ 4093.900402]  pci_stop_and_remove_bus_device+0xe/0x20
      [ 4093.900403]  pci_iov_remove_virtfn+0x187/0x420
      [ 4093.900404]  sriov_disable+0xed/0x3e0
      [ 4093.900409]  i40e_free_vfs+0x754/0x1210 [i40e]
      [ 4093.900415]  i40e_pci_sriov_configure+0x1fa/0x2e0 [i40e]
      [ 4093.900416]  sriov_numvfs_store+0x214/0x290
      [ 4093.900417]  kernfs_fop_write+0x280/0x3f0
      [ 4093.900418]  vfs_write+0x145/0x440
      [ 4093.900419]  ksys_write+0xab/0x160
      [ 4093.900420]  do_syscall_64+0xa0/0x370
      [ 4093.900421]  entry_SYSCALL_64_after_hwframe+0x65/0xca
      [ 4093.900422]  0xffffffffffffffff
      [ 4093.900422]
      [ 4093.900424] The buggy address belongs to the object at ffff88b4dc144200
                      which belongs to the cache kmalloc-8k of size 8192
      [ 4093.900425] The buggy address is located 5184 bytes inside of
                      8192-byte region [ffff88b4dc144200, ffff88b4dc146200)
      [ 4093.900425] The buggy address belongs to the page:
      [ 4093.900427] page:ffffea00d3705000 refcount:1 mapcount:0 mapping:ffff88bf04415c80 index:0x0 compound_mapcount: 0
      [ 4093.900430] flags: 0x10000000008100(slab|head)
      [ 4093.900433] raw: 0010000000008100 dead000000000100 dead000000000200 ffff88bf04415c80
      [ 4093.900434] raw: 0000000000000000 0000000000030003 00000001ffffffff 0000000000000000
      [ 4093.900434] page dumped because: kasan: bad access detected
      [ 4093.900435]
      [ 4093.900435] Memory state around the buggy address:
      [ 4093.900436]  ffff88b4dc145500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [ 4093.900437]  ffff88b4dc145580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [ 4093.900438] >ffff88b4dc145600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [ 4093.900438]                                            ^
      [ 4093.900439]  ffff88b4dc145680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [ 4093.900440]  ffff88b4dc145700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [ 4093.900440] ==================================================================
      
      Although the patch #2 (of 2) can avoid the issue triggered by this
      repro.sh, there still are other potential risks that if num_active_queues
      is changed to less than allocated q_vectors[] by unexpected, the
      mismatched netif_napi_add/del() can also cause UAF.
      
      Since we actually call netif_napi_add() for all allocated q_vectors
      unconditionally in iavf_alloc_q_vectors(), so we should fix it by
      letting netif_napi_del() match to netif_napi_add().
      
      Fixes: 5eae00c5 ("i40evf: main driver core")
      Signed-off-by: default avatarDing Hui <dinghui@sangfor.com.cn>
      Cc: Donglin Peng <pengdonglin@sangfor.com.cn>
      Cc: Huang Cun <huangcun@sangfor.com.cn>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarMadhu Chittim <madhu.chittim@intel.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      5f4fa167
    • Marc Kleine-Budde's avatar
      Merge patch series "can: gs_usb: fix time stamp counter initialization" · dc050849
      Marc Kleine-Budde authored
      Marc Kleine-Budde <mkl@pengutronix.de> says:
      
      During testing I noticed a crash if unloading/loading the gs_usb
      driver during high CAN bus load.
      
      The current version of the candlelight firmware doesn't flush the
      queues of the received CAN frames during the reset command. This leads
      to a crash if hardware timestamps are enabled, and an URB from the
      device is received before the cycle counter/time counter
      infrastructure has been setup.
      
      First clean up then error handling in gs_can_open().
      
      Then, fix the problem by converting the cycle counter/time counter
      infrastructure from a per-channel to per-device and set it up before
      submitting RX-URBs to the USB stack.
      
      Link: https://lore.kernel.org/all/20230716-gs_usb-fix-time-stamp-counter-v1-0-9017cefcd9d5@pengutronix.deSigned-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      dc050849
    • Marc Kleine-Budde's avatar
      can: gs_usb: fix time stamp counter initialization · 5886e4d5
      Marc Kleine-Budde authored
      If the gs_usb device driver is unloaded (or unbound) before the
      interface is shut down, the USB stack first calls the struct
      usb_driver::disconnect and then the struct net_device_ops::ndo_stop
      callback.
      
      In gs_usb_disconnect() all pending bulk URBs are killed, i.e. no more
      RX'ed CAN frames are send from the USB device to the host. Later in
      gs_can_close() a reset control message is send to each CAN channel to
      remove the controller from the CAN bus. In this race window the USB
      device can still receive CAN frames from the bus and internally queue
      them to be send to the host.
      
      At least in the current version of the candlelight firmware, the queue
      of received CAN frames is not emptied during the reset command. After
      loading (or binding) the gs_usb driver, new URBs are submitted during
      the struct net_device_ops::ndo_open callback and the candlelight
      firmware starts sending its already queued CAN frames to the host.
      
      However, this scenario was not considered when implementing the
      hardware timestamp function. The cycle counter/time counter
      infrastructure is set up (gs_usb_timestamp_init()) after the USBs are
      submitted, resulting in a NULL pointer dereference if
      timecounter_cyc2time() (via the call chain:
      gs_usb_receive_bulk_callback() -> gs_usb_set_timestamp() ->
      gs_usb_skb_set_timestamp()) is called too early.
      
      Move the gs_usb_timestamp_init() function before the URBs are
      submitted to fix this problem.
      
      For a comprehensive solution, we need to consider gs_usb devices with
      more than 1 channel. The cycle counter/time counter infrastructure is
      setup per channel, but the RX URBs are per device. Once gs_can_open()
      of _a_ channel has been called, and URBs have been submitted, the
      gs_usb_receive_bulk_callback() can be called for _all_ available
      channels, even for channels that are not running, yet. As cycle
      counter/time counter has not set up, this will again lead to a NULL
      pointer dereference.
      
      Convert the cycle counter/time counter from a "per channel" to a "per
      device" functionality. Also set it up, before submitting any URBs to
      the device.
      
      Further in gs_usb_receive_bulk_callback(), don't process any URBs for
      not started CAN channels, only resubmit the URB.
      
      Fixes: 45dfa45f ("can: gs_usb: add RX and TX hardware timestamp support")
      Closes: https://github.com/candle-usb/candleLight_fw/issues/137#issuecomment-1623532076
      Cc: stable@vger.kernel.org
      Cc: John Whittington <git@jbrengineering.co.uk>
      Link: https://lore.kernel.org/all/20230716-gs_usb-fix-time-stamp-counter-v1-2-9017cefcd9d5@pengutronix.deSigned-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      5886e4d5
    • Marc Kleine-Budde's avatar
      can: gs_usb: gs_can_open(): improve error handling · 2603be9e
      Marc Kleine-Budde authored
      The gs_usb driver handles USB devices with more than 1 CAN channel.
      The RX path for all channels share the same bulk endpoint (the
      transmitted bulk data encodes the channel number). These per-device
      resources are allocated and submitted by the first opened channel.
      
      During this allocation, the resources are either released immediately
      in case of a failure or the URBs are anchored. All anchored URBs are
      finally killed with gs_usb_disconnect().
      
      Currently, gs_can_open() returns with an error if the allocation of a
      URB or a buffer fails. However, if usb_submit_urb() fails, the driver
      continues with the URBs submitted so far, even if no URBs were
      successfully submitted.
      
      Treat every error as fatal and free all allocated resources
      immediately.
      
      Switch to goto-style error handling, to prepare the driver for more
      per-device resource allocation.
      
      Cc: stable@vger.kernel.org
      Cc: John Whittington <git@jbrengineering.co.uk>
      Link: https://lore.kernel.org/all/20230716-gs_usb-fix-time-stamp-counter-v1-1-9017cefcd9d5@pengutronix.deSigned-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      2603be9e
    • YueHaibing's avatar
      can: bcm: Fix UAF in bcm_proc_show() · 55c3b960
      YueHaibing authored
      BUG: KASAN: slab-use-after-free in bcm_proc_show+0x969/0xa80
      Read of size 8 at addr ffff888155846230 by task cat/7862
      
      CPU: 1 PID: 7862 Comm: cat Not tainted 6.5.0-rc1-00153-gc8746099c197 #230
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl+0xd5/0x150
       print_report+0xc1/0x5e0
       kasan_report+0xba/0xf0
       bcm_proc_show+0x969/0xa80
       seq_read_iter+0x4f6/0x1260
       seq_read+0x165/0x210
       proc_reg_read+0x227/0x300
       vfs_read+0x1d5/0x8d0
       ksys_read+0x11e/0x240
       do_syscall_64+0x35/0xb0
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Allocated by task 7846:
       kasan_save_stack+0x1e/0x40
       kasan_set_track+0x21/0x30
       __kasan_kmalloc+0x9e/0xa0
       bcm_sendmsg+0x264b/0x44e0
       sock_sendmsg+0xda/0x180
       ____sys_sendmsg+0x735/0x920
       ___sys_sendmsg+0x11d/0x1b0
       __sys_sendmsg+0xfa/0x1d0
       do_syscall_64+0x35/0xb0
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Freed by task 7846:
       kasan_save_stack+0x1e/0x40
       kasan_set_track+0x21/0x30
       kasan_save_free_info+0x27/0x40
       ____kasan_slab_free+0x161/0x1c0
       slab_free_freelist_hook+0x119/0x220
       __kmem_cache_free+0xb4/0x2e0
       rcu_core+0x809/0x1bd0
      
      bcm_op is freed before procfs entry be removed in bcm_release(),
      this lead to bcm_proc_show() may read the freed bcm_op.
      
      Fixes: ffd980f9 ("[CAN]: Add broadcast manager (bcm) protocol")
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Reviewed-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Acked-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Link: https://lore.kernel.org/all/20230715092543.15548-1-yuehaibing@huawei.com
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      55c3b960
    • Ziyang Xuan's avatar
      can: raw: fix receiver memory leak · ee8b94c8
      Ziyang Xuan authored
      Got kmemleak errors with the following ltp can_filter testcase:
      
      for ((i=1; i<=100; i++))
      do
              ./can_filter &
              sleep 0.1
      done
      
      ==============================================================
      [<00000000db4a4943>] can_rx_register+0x147/0x360 [can]
      [<00000000a289549d>] raw_setsockopt+0x5ef/0x853 [can_raw]
      [<000000006d3d9ebd>] __sys_setsockopt+0x173/0x2c0
      [<00000000407dbfec>] __x64_sys_setsockopt+0x61/0x70
      [<00000000fd468496>] do_syscall_64+0x33/0x40
      [<00000000b7e47d51>] entry_SYSCALL_64_after_hwframe+0x61/0xc6
      
      It's a bug in the concurrent scenario of unregister_netdevice_many()
      and raw_release() as following:
      
                   cpu0                                        cpu1
      unregister_netdevice_many(can_dev)
        unlist_netdevice(can_dev) // dev_get_by_index() return NULL after this
        net_set_todo(can_dev)
      						raw_release(can_socket)
      						  dev = dev_get_by_index(, ro->ifindex); // dev == NULL
      						  if (dev) { // receivers in dev_rcv_lists not free because dev is NULL
      						    raw_disable_allfilters(, dev, );
      						    dev_put(dev);
      						  }
      						  ...
      						  ro->bound = 0;
      						  ...
      
      call_netdevice_notifiers(NETDEV_UNREGISTER, )
        raw_notify(, NETDEV_UNREGISTER, )
          if (ro->bound) // invalid because ro->bound has been set 0
            raw_disable_allfilters(, dev, ); // receivers in dev_rcv_lists will never be freed
      
      Add a net_device pointer member in struct raw_sock to record bound
      can_dev, and use rtnl_lock to serialize raw_socket members between
      raw_bind(), raw_release(), raw_setsockopt() and raw_notify(). Use
      ro->dev to decide whether to free receivers in dev_rcv_lists.
      
      Fixes: 8d0caedb ("can: bcm/raw/isotp: use per module netdevice notifier")
      Reviewed-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Acked-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Link: https://lore.kernel.org/all/20230711011737.1969582-1-william.xuanziyang@huawei.com
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      ee8b94c8
    • Heiner Kallweit's avatar
      r8169: fix ASPM-related problem for chip version 42 and 43 · 162d626f
      Heiner Kallweit authored
      Referenced commit missed that for chip versions 42 and 43 ASPM
      remained disabled in the respective rtl_hw_start_...() routines.
      This resulted in problems as described in the referenced bug
      ticket. Therefore re-instantiate the previous logic.
      
      Fixes: 5fc3f6c9 ("r8169: consolidate disabling ASPM before EPHY access")
      Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217635Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      162d626f
    • Tristram Ha's avatar
      net: dsa: microchip: correct KSZ8795 static MAC table access · 4bdf79d6
      Tristram Ha authored
      The KSZ8795 driver code was modified to use on KSZ8863/73, which has
      different register definitions.  Some of the new KSZ8795 register
      information are wrong compared to previous code.
      
      KSZ8795 also behaves differently in that the STATIC_MAC_TABLE_USE_FID
      and STATIC_MAC_TABLE_FID bits are off by 1 when doing MAC table reading
      than writing.  To compensate that a special code was added to shift the
      register value by 1 before applying those bits.  This is wrong when the
      code is running on KSZ8863, so this special code is only executed when
      KSZ8795 is detected.
      
      Fixes: 4b20a07e ("net: dsa: microchip: ksz8795: add support for ksz88xx chips")
      Signed-off-by: default avatarTristram Ha <Tristram.Ha@microchip.com>
      Reviewed-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4bdf79d6
    • David S. Miller's avatar
      Merge branch 'sched-fixes' · 6e8778f8
      David S. Miller authored
      Victor Nogueira says:
      
      ====================
      net: sched: Fixes for classifiers
      
      Four different classifiers (bpf, u32, matchall, and flower) are
      calling tcf_bind_filter in their callbacks, but arent't undoing it by
      calling tcf_unbind_filter if their was an error after binding.
      
      This patch set fixes all this by calling tcf_unbind_filter in such
      cases.
      
      This set also undoes a refcount decrement in cls_u32 when an update
      fails under specific conditions which are described in patch #3.
      
      v1 -> v2:
      * Remove blank line after fixes tag
      * Fix reverse xmas tree issues pointed out by Simon
      
      v2 -> v3:
      * Inline functions cls_bpf_set_parms and fl_set_parms to avoid adding
        yet another parameter (and a return value at it) to them.
      * Remove similar fixes for u32 and matchall, which will be sent soon,
        once we find a way to do the fixes without adding a return parameter
        to their set_parms functions.
      
      v3 -> v4:
      * Inline mall_set_parms to avoid adding yet another parameter.
      * Remove set_flags parameter from u32_set_parms and create a separate
        function for calling tcf_bind_filter and tcf_unbind_filter in case of
        failure.
      * Change cover letter title to also encompass refcnt fix for u32
      
      v4 -> v5:
      * Change back tag to net
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e8778f8
    • Victor Nogueira's avatar
      net: sched: cls_flower: Undo tcf_bind_filter in case of an error · ac177a33
      Victor Nogueira authored
      If TCA_FLOWER_CLASSID is specified in the netlink message, the code will
      call tcf_bind_filter. However, if any error occurs after that, the code
      should undo this by calling tcf_unbind_filter.
      
      Fixes: 77b9900e ("tc: introduce Flower classifier")
      Signed-off-by: default avatarVictor Nogueira <victor@mojatatu.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ac177a33
    • Victor Nogueira's avatar
      net: sched: cls_bpf: Undo tcf_bind_filter in case of an error · 26a22194
      Victor Nogueira authored
      If cls_bpf_offload errors out, we must also undo tcf_bind_filter that
      was done before the error.
      
      Fix that by calling tcf_unbind_filter in errout_parms.
      
      Fixes: eadb4148 ("net: cls_bpf: add support for marking filters as hardware-only")
      Signed-off-by: default avatarVictor Nogueira <victor@mojatatu.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      26a22194
    • Victor Nogueira's avatar
      net: sched: cls_u32: Undo refcount decrement in case update failed · e8d3d78c
      Victor Nogueira authored
      In the case of an update, when TCA_U32_LINK is set, u32_set_parms will
      decrement the refcount of the ht_down (struct tc_u_hnode) pointer
      present in the older u32 filter which we are replacing. However, if
      u32_replace_hw_knode errors out, the update command fails and that
      ht_down pointer continues decremented. To fix that, when
      u32_replace_hw_knode fails, check if ht_down's refcount was decremented
      and undo the decrement.
      
      Fixes: d34e3e18 ("net: cls_u32: Add support for skip-sw flag to tc u32 classifier.")
      Signed-off-by: default avatarVictor Nogueira <victor@mojatatu.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e8d3d78c
    • Victor Nogueira's avatar
      net: sched: cls_u32: Undo tcf_bind_filter if u32_replace_hw_knode · 9cb36fae
      Victor Nogueira authored
      When u32_replace_hw_knode fails, we need to undo the tcf_bind_filter
      operation done at u32_set_parms.
      
      Fixes: d34e3e18 ("net: cls_u32: Add support for skip-sw flag to tc u32 classifier.")
      Signed-off-by: default avatarVictor Nogueira <victor@mojatatu.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9cb36fae
    • Victor Nogueira's avatar
      net: sched: cls_matchall: Undo tcf_bind_filter in case of failure after mall_set_parms · b3d0e048
      Victor Nogueira authored
      In case an error occurred after mall_set_parms executed successfully, we
      must undo the tcf_bind_filter call it issues.
      
      Fix that by calling tcf_unbind_filter in err_replace_hw_filter label.
      
      Fixes: ec2507d2 ("net/sched: cls_matchall: Fix error path")
      Signed-off-by: default avatarVictor Nogueira <victor@mojatatu.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: default avatarPedro Tammela <pctammela@mojatatu.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b3d0e048
  4. 15 Jul, 2023 2 commits