1. 20 Oct, 2023 5 commits
    • Michal Schmidt's avatar
      iavf: initialize waitqueues before starting watchdog_task · 7db31110
      Michal Schmidt authored
      It is not safe to initialize the waitqueues after queueing the
      watchdog_task. It will be using them.
      
      The chance of this causing a real problem is very small, because
      there will be some sleeping before any of the waitqueues get used.
      I got a crash only after inserting an artificial sleep in iavf_probe.
      
      Queue the watchdog_task as the last step in iavf_probe. Add a comment to
      prevent repeating the mistake.
      
      Fixes: fe2647ab ("i40evf: prevent VF close returning before state transitions to DOWN")
      Signed-off-by: default avatarMichal Schmidt <mschmidt@redhat.com>
      Reviewed-by: default avatarPaul Menzel <pmenzel@molgen.mpg.de>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7db31110
    • Mirsad Goran Todorovac's avatar
      r8169: fix the KCSAN reported data race in rtl_rx while reading desc->opts1 · f97eee48
      Mirsad Goran Todorovac authored
      KCSAN reported the following data-race bug:
      
      ==================================================================
      BUG: KCSAN: data-race in rtl8169_poll (drivers/net/ethernet/realtek/r8169_main.c:4430 drivers/net/ethernet/realtek/r8169_main.c:4583) r8169
      
      race at unknown origin, with read to 0xffff888117e43510 of 4 bytes by interrupt on cpu 21:
      rtl8169_poll (drivers/net/ethernet/realtek/r8169_main.c:4430 drivers/net/ethernet/realtek/r8169_main.c:4583) r8169
      __napi_poll (net/core/dev.c:6527)
      net_rx_action (net/core/dev.c:6596 net/core/dev.c:6727)
      __do_softirq (kernel/softirq.c:553)
      __irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632)
      irq_exit_rcu (kernel/softirq.c:647)
      sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1074 (discriminator 14))
      asm_sysvec_apic_timer_interrupt (./arch/x86/include/asm/idtentry.h:645)
      cpuidle_enter_state (drivers/cpuidle/cpuidle.c:291)
      cpuidle_enter (drivers/cpuidle/cpuidle.c:390)
      call_cpuidle (kernel/sched/idle.c:135)
      do_idle (kernel/sched/idle.c:219 kernel/sched/idle.c:282)
      cpu_startup_entry (kernel/sched/idle.c:378 (discriminator 1))
      start_secondary (arch/x86/kernel/smpboot.c:210 arch/x86/kernel/smpboot.c:294)
      secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:433)
      
      value changed: 0x80003fff -> 0x3402805f
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 21 PID: 0 Comm: swapper/21 Tainted: G             L     6.6.0-rc2-kcsan-00143-gb5cbe7c0 #41
      Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
      ==================================================================
      
      drivers/net/ethernet/realtek/r8169_main.c:
      ==========================================
         4429
       → 4430                 status = le32_to_cpu(desc->opts1);
         4431                 if (status & DescOwn)
         4432                         break;
         4433
         4434                 /* This barrier is needed to keep us from reading
         4435                  * any other fields out of the Rx descriptor until
         4436                  * we know the status of DescOwn
         4437                  */
         4438                 dma_rmb();
         4439
         4440                 if (unlikely(status & RxRES)) {
         4441                         if (net_ratelimit())
         4442                                 netdev_warn(dev, "Rx ERROR. status = %08x\n",
      
      Marco Elver explained that dma_rmb() doesn't prevent the compiler to tear up the access to
      desc->opts1 which can be written to concurrently. READ_ONCE() should prevent that from
      happening:
      
         4429
       → 4430                 status = le32_to_cpu(READ_ONCE(desc->opts1));
         4431                 if (status & DescOwn)
         4432                         break;
         4433
      
      As the consequence of this fix, this KCSAN warning was eliminated.
      
      Fixes: 6202806e ("r8169: drop member opts1_mask from struct rtl8169_private")
      Suggested-by: default avatarMarco Elver <elver@google.com>
      Cc: Heiner Kallweit <hkallweit1@gmail.com>
      Cc: nic_swsd@realtek.com
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: netdev@vger.kernel.org
      Link: https://lore.kernel.org/lkml/dc7fc8fa-4ea4-e9a9-30a6-7c83e6b53188@alu.unizg.hr/Signed-off-by: default avatarMirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
      Acked-by: default avatarMarco Elver <elver@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f97eee48
    • Mirsad Goran Todorovac's avatar
      r8169: fix the KCSAN reported data-race in rtl_tx while reading TxDescArray[entry].opts1 · dcf75a0f
      Mirsad Goran Todorovac authored
      KCSAN reported the following data-race:
      
      ==================================================================
      BUG: KCSAN: data-race in rtl8169_poll (drivers/net/ethernet/realtek/r8169_main.c:4368 drivers/net/ethernet/realtek/r8169_main.c:4581) r8169
      
      race at unknown origin, with read to 0xffff888140d37570 of 4 bytes by interrupt on cpu 21:
      rtl8169_poll (drivers/net/ethernet/realtek/r8169_main.c:4368 drivers/net/ethernet/realtek/r8169_main.c:4581) r8169
      __napi_poll (net/core/dev.c:6527)
      net_rx_action (net/core/dev.c:6596 net/core/dev.c:6727)
      __do_softirq (kernel/softirq.c:553)
      __irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632)
      irq_exit_rcu (kernel/softirq.c:647)
      sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1074 (discriminator 14))
      asm_sysvec_apic_timer_interrupt (./arch/x86/include/asm/idtentry.h:645)
      cpuidle_enter_state (drivers/cpuidle/cpuidle.c:291)
      cpuidle_enter (drivers/cpuidle/cpuidle.c:390)
      call_cpuidle (kernel/sched/idle.c:135)
      do_idle (kernel/sched/idle.c:219 kernel/sched/idle.c:282)
      cpu_startup_entry (kernel/sched/idle.c:378 (discriminator 1))
      start_secondary (arch/x86/kernel/smpboot.c:210 arch/x86/kernel/smpboot.c:294)
      secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:433)
      
      value changed: 0xb0000042 -> 0x00000000
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 21 PID: 0 Comm: swapper/21 Tainted: G             L     6.6.0-rc2-kcsan-00143-gb5cbe7c0 #41
      Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
      ==================================================================
      
      The read side is in
      
      drivers/net/ethernet/realtek/r8169_main.c
      =========================================
         4355 static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp,
         4356                    int budget)
         4357 {
         4358         unsigned int dirty_tx, bytes_compl = 0, pkts_compl = 0;
         4359         struct sk_buff *skb;
         4360
         4361         dirty_tx = tp->dirty_tx;
         4362
         4363         while (READ_ONCE(tp->cur_tx) != dirty_tx) {
         4364                 unsigned int entry = dirty_tx % NUM_TX_DESC;
         4365                 u32 status;
         4366
       → 4367                 status = le32_to_cpu(tp->TxDescArray[entry].opts1);
         4368                 if (status & DescOwn)
         4369                         break;
         4370
         4371                 skb = tp->tx_skb[entry].skb;
         4372                 rtl8169_unmap_tx_skb(tp, entry);
         4373
         4374                 if (skb) {
         4375                         pkts_compl++;
         4376                         bytes_compl += skb->len;
         4377                         napi_consume_skb(skb, budget);
         4378                 }
         4379                 dirty_tx++;
         4380         }
         4381
         4382         if (tp->dirty_tx != dirty_tx) {
         4383                 dev_sw_netstats_tx_add(dev, pkts_compl, bytes_compl);
         4384                 WRITE_ONCE(tp->dirty_tx, dirty_tx);
         4385
         4386                 netif_subqueue_completed_wake(dev, 0, pkts_compl, bytes_compl,
         4387                                               rtl_tx_slots_avail(tp),
         4388                                               R8169_TX_START_THRS);
         4389                 /*
         4390                  * 8168 hack: TxPoll requests are lost when the Tx packets are
         4391                  * too close. Let's kick an extra TxPoll request when a burst
         4392                  * of start_xmit activity is detected (if it is not detected,
         4393                  * it is slow enough). -- FR
         4394                  * If skb is NULL then we come here again once a tx irq is
         4395                  * triggered after the last fragment is marked transmitted.
         4396                  */
         4397                 if (READ_ONCE(tp->cur_tx) != dirty_tx && skb)
         4398                         rtl8169_doorbell(tp);
         4399         }
         4400 }
      
      tp->TxDescArray[entry].opts1 is reported to have a data-race and READ_ONCE() fixes
      this KCSAN warning.
      
         4366
       → 4367                 status = le32_to_cpu(READ_ONCE(tp->TxDescArray[entry].opts1));
         4368                 if (status & DescOwn)
         4369                         break;
         4370
      
      Cc: Heiner Kallweit <hkallweit1@gmail.com>
      Cc: nic_swsd@realtek.com
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Marco Elver <elver@google.com>
      Cc: netdev@vger.kernel.org
      Link: https://lore.kernel.org/lkml/dc7fc8fa-4ea4-e9a9-30a6-7c83e6b53188@alu.unizg.hr/Signed-off-by: default avatarMirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
      Acked-by: default avatarMarco Elver <elver@google.com>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dcf75a0f
    • Mirsad Goran Todorovac's avatar
      r8169: fix the KCSAN reported data-race in rtl_tx() while reading tp->cur_tx · c1c0ce31
      Mirsad Goran Todorovac authored
      KCSAN reported the following data-race:
      
      ==================================================================
      BUG: KCSAN: data-race in rtl8169_poll [r8169] / rtl8169_start_xmit [r8169]
      
      write (marked) to 0xffff888102474b74 of 4 bytes by task 5358 on cpu 29:
      rtl8169_start_xmit (drivers/net/ethernet/realtek/r8169_main.c:4254) r8169
      dev_hard_start_xmit (./include/linux/netdevice.h:4889 ./include/linux/netdevice.h:4903 net/core/dev.c:3544 net/core/dev.c:3560)
      sch_direct_xmit (net/sched/sch_generic.c:342)
      __dev_queue_xmit (net/core/dev.c:3817 net/core/dev.c:4306)
      ip_finish_output2 (./include/linux/netdevice.h:3082 ./include/net/neighbour.h:526 ./include/net/neighbour.h:540 net/ipv4/ip_output.c:233)
      __ip_finish_output (net/ipv4/ip_output.c:311 net/ipv4/ip_output.c:293)
      ip_finish_output (net/ipv4/ip_output.c:328)
      ip_output (net/ipv4/ip_output.c:435)
      ip_send_skb (./include/net/dst.h:458 net/ipv4/ip_output.c:127 net/ipv4/ip_output.c:1486)
      udp_send_skb (net/ipv4/udp.c:963)
      udp_sendmsg (net/ipv4/udp.c:1246)
      inet_sendmsg (net/ipv4/af_inet.c:840 (discriminator 4))
      sock_sendmsg (net/socket.c:730 net/socket.c:753)
      __sys_sendto (net/socket.c:2177)
      __x64_sys_sendto (net/socket.c:2185)
      do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
      entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
      
      read to 0xffff888102474b74 of 4 bytes by interrupt on cpu 21:
      rtl8169_poll (drivers/net/ethernet/realtek/r8169_main.c:4397 drivers/net/ethernet/realtek/r8169_main.c:4581) r8169
      __napi_poll (net/core/dev.c:6527)
      net_rx_action (net/core/dev.c:6596 net/core/dev.c:6727)
      __do_softirq (kernel/softirq.c:553)
      __irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632)
      irq_exit_rcu (kernel/softirq.c:647)
      common_interrupt (arch/x86/kernel/irq.c:247 (discriminator 14))
      asm_common_interrupt (./arch/x86/include/asm/idtentry.h:636)
      cpuidle_enter_state (drivers/cpuidle/cpuidle.c:291)
      cpuidle_enter (drivers/cpuidle/cpuidle.c:390)
      call_cpuidle (kernel/sched/idle.c:135)
      do_idle (kernel/sched/idle.c:219 kernel/sched/idle.c:282)
      cpu_startup_entry (kernel/sched/idle.c:378 (discriminator 1))
      start_secondary (arch/x86/kernel/smpboot.c:210 arch/x86/kernel/smpboot.c:294)
      secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:433)
      
      value changed: 0x002f4815 -> 0x002f4816
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 21 PID: 0 Comm: swapper/21 Tainted: G             L     6.6.0-rc2-kcsan-00143-gb5cbe7c0 #41
      Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
      ==================================================================
      
      The write side of drivers/net/ethernet/realtek/r8169_main.c is:
      ==================
         4251         /* rtl_tx needs to see descriptor changes before updated tp->cur_tx */
         4252         smp_wmb();
         4253
       → 4254         WRITE_ONCE(tp->cur_tx, tp->cur_tx + frags + 1);
         4255
         4256         stop_queue = !netif_subqueue_maybe_stop(dev, 0, rtl_tx_slots_avail(tp),
         4257                                                 R8169_TX_STOP_THRS,
         4258                                                 R8169_TX_START_THRS);
      
      The read side is the function rtl_tx():
      
         4355 static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp,
         4356                    int budget)
         4357 {
         4358         unsigned int dirty_tx, bytes_compl = 0, pkts_compl = 0;
         4359         struct sk_buff *skb;
         4360
         4361         dirty_tx = tp->dirty_tx;
         4362
         4363         while (READ_ONCE(tp->cur_tx) != dirty_tx) {
         4364                 unsigned int entry = dirty_tx % NUM_TX_DESC;
         4365                 u32 status;
         4366
         4367                 status = le32_to_cpu(tp->TxDescArray[entry].opts1);
         4368                 if (status & DescOwn)
         4369                         break;
         4370
         4371                 skb = tp->tx_skb[entry].skb;
         4372                 rtl8169_unmap_tx_skb(tp, entry);
         4373
         4374                 if (skb) {
         4375                         pkts_compl++;
         4376                         bytes_compl += skb->len;
         4377                         napi_consume_skb(skb, budget);
         4378                 }
         4379                 dirty_tx++;
         4380         }
         4381
         4382         if (tp->dirty_tx != dirty_tx) {
         4383                 dev_sw_netstats_tx_add(dev, pkts_compl, bytes_compl);
         4384                 WRITE_ONCE(tp->dirty_tx, dirty_tx);
         4385
         4386                 netif_subqueue_completed_wake(dev, 0, pkts_compl, bytes_compl,
         4387                                               rtl_tx_slots_avail(tp),
         4388                                               R8169_TX_START_THRS);
         4389                 /*
         4390                  * 8168 hack: TxPoll requests are lost when the Tx packets are
         4391                  * too close. Let's kick an extra TxPoll request when a burst
         4392                  * of start_xmit activity is detected (if it is not detected,
         4393                  * it is slow enough). -- FR
         4394                  * If skb is NULL then we come here again once a tx irq is
         4395                  * triggered after the last fragment is marked transmitted.
         4396                  */
       → 4397                 if (tp->cur_tx != dirty_tx && skb)
         4398                         rtl8169_doorbell(tp);
         4399         }
         4400 }
      
      Obviously from the code, an earlier detected data-race for tp->cur_tx was fixed in the
      line 4363:
      
         4363         while (READ_ONCE(tp->cur_tx) != dirty_tx) {
      
      but the same solution is required for protecting the other access to tp->cur_tx:
      
       → 4397                 if (READ_ONCE(tp->cur_tx) != dirty_tx && skb)
         4398                         rtl8169_doorbell(tp);
      
      The write in the line 4254 is protected with WRITE_ONCE(), but the read in the line 4397
      might have suffered read tearing under some compiler optimisations.
      
      The fix eliminated the KCSAN data-race report for this bug.
      
      It is yet to be evaluated what happens if tp->cur_tx changes between the test in line 4363
      and line 4397. This test should certainly not be cached by the compiler in some register
      for such a long time, while asynchronous writes to tp->cur_tx might have occurred in line
      4254 in the meantime.
      
      Fixes: 94d8a98e ("r8169: reduce number of workaround doorbell rings")
      Cc: Heiner Kallweit <hkallweit1@gmail.com>
      Cc: nic_swsd@realtek.com
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Marco Elver <elver@google.com>
      Cc: netdev@vger.kernel.org
      Link: https://lore.kernel.org/lkml/dc7fc8fa-4ea4-e9a9-30a6-7c83e6b53188@alu.unizg.hr/Signed-off-by: default avatarMirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
      Acked-by: default avatarMarco Elver <elver@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1c0ce31
    • Maciej Fijalkowski's avatar
      i40e: xsk: remove count_mask · 913eda2b
      Maciej Fijalkowski authored
      Cited commit introduced a neat way of updating next_to_clean that does
      not require boundary checks on each increment. This was done by masking
      the new value with (ring length - 1) mask. Problem is that this is
      applicable only for power of 2 ring sizes, for every other size this
      assumption can not be made. In turn, it leads to cleaning descriptors
      out of order as well as splats:
      
      [ 1388.411915] Workqueue: events xp_release_deferred
      [ 1388.411919] RIP: 0010:xp_free+0x1a/0x50
      [ 1388.411921] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 55 48 8b 57 70 48 8d 47 70 48 89 e5 48 39 d0 74 06 <5d> c3 cc cc cc cc 48 8b 57 60 83 82 b8 00 00 00 01 48 8b 57 60 48
      [ 1388.411922] RSP: 0018:ffa0000000a83cb0 EFLAGS: 00000206
      [ 1388.411923] RAX: ff11000119aa5030 RBX: 000000000000001d RCX: ff110001129b6e50
      [ 1388.411924] RDX: ff11000119aa4fa0 RSI: 0000000055555554 RDI: ff11000119aa4fc0
      [ 1388.411925] RBP: ffa0000000a83cb0 R08: 0000000000000000 R09: 0000000000000000
      [ 1388.411926] R10: 0000000000000001 R11: 0000000000000000 R12: ff11000115829b80
      [ 1388.411927] R13: 000000000000005f R14: 0000000000000000 R15: ff11000119aa4fc0
      [ 1388.411928] FS:  0000000000000000(0000) GS:ff11000277e00000(0000) knlGS:0000000000000000
      [ 1388.411929] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1388.411930] CR2: 00007f1f564e6c14 CR3: 000000000783c005 CR4: 0000000000771ef0
      [ 1388.411931] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 1388.411931] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 1388.411932] PKRU: 55555554
      [ 1388.411933] Call Trace:
      [ 1388.411934]  <IRQ>
      [ 1388.411935]  ? show_regs+0x6e/0x80
      [ 1388.411937]  ? watchdog_timer_fn+0x1d2/0x240
      [ 1388.411939]  ? __pfx_watchdog_timer_fn+0x10/0x10
      [ 1388.411941]  ? __hrtimer_run_queues+0x10e/0x290
      [ 1388.411945]  ? clockevents_program_event+0xae/0x130
      [ 1388.411947]  ? hrtimer_interrupt+0x105/0x240
      [ 1388.411949]  ? __sysvec_apic_timer_interrupt+0x54/0x150
      [ 1388.411952]  ? sysvec_apic_timer_interrupt+0x7f/0x90
      [ 1388.411955]  </IRQ>
      [ 1388.411955]  <TASK>
      [ 1388.411956]  ? asm_sysvec_apic_timer_interrupt+0x1f/0x30
      [ 1388.411958]  ? xp_free+0x1a/0x50
      [ 1388.411960]  i40e_xsk_clean_rx_ring+0x5d/0x100 [i40e]
      [ 1388.411968]  i40e_clean_rx_ring+0x14c/0x170 [i40e]
      [ 1388.411977]  i40e_queue_pair_disable+0xda/0x260 [i40e]
      [ 1388.411986]  i40e_xsk_pool_setup+0x192/0x1d0 [i40e]
      [ 1388.411993]  i40e_reconfig_rss_queues+0x1f0/0x1450 [i40e]
      [ 1388.412002]  xp_disable_drv_zc+0x73/0xf0
      [ 1388.412004]  ? mutex_lock+0x17/0x50
      [ 1388.412007]  xp_release_deferred+0x2b/0xc0
      [ 1388.412010]  process_one_work+0x178/0x350
      [ 1388.412011]  ? __pfx_worker_thread+0x10/0x10
      [ 1388.412012]  worker_thread+0x2f7/0x420
      [ 1388.412014]  ? __pfx_worker_thread+0x10/0x10
      [ 1388.412015]  kthread+0xf8/0x130
      [ 1388.412017]  ? __pfx_kthread+0x10/0x10
      [ 1388.412019]  ret_from_fork+0x3d/0x60
      [ 1388.412021]  ? __pfx_kthread+0x10/0x10
      [ 1388.412023]  ret_from_fork_asm+0x1b/0x30
      [ 1388.412026]  </TASK>
      
      It comes from picking wrong ring entries when cleaning xsk buffers
      during pool detach.
      
      Remove the count_mask logic and use they boundary check when updating
      next_to_process (which used to be a next_to_clean).
      
      Fixes: c8a8ca34 ("i40e: remove unnecessary memory writes of the next to clean pointer")
      Reported-by: default avatarTushar Vyavahare <tushar.vyavahare@intel.com>
      Tested-by: default avatarTushar Vyavahare <tushar.vyavahare@intel.com>
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Link: https://lore.kernel.org/r/20231018163908.40841-1-maciej.fijalkowski@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      913eda2b
  2. 19 Oct, 2023 32 commits
  3. 18 Oct, 2023 3 commits