1. 25 Oct, 2023 2 commits
    • Vlad Buslov's avatar
      net/sched: act_ct: additional checks for outdated flows · a63b6622
      Vlad Buslov authored
      Current nf_flow_is_outdated() implementation considers any flow table flow
      which state diverged from its underlying CT connection status for teardown
      which can be problematic in the following cases:
      
      - Flow has never been offloaded to hardware in the first place either
      because flow table has hardware offload disabled (flag
      NF_FLOWTABLE_HW_OFFLOAD is not set) or because it is still pending on 'add'
      workqueue to be offloaded for the first time. The former is incorrect, the
      later generates excessive deletions and additions of flows.
      
      - Flow is already pending to be updated on the workqueue. Tearing down such
      flows will also generate excessive removals from the flow table, especially
      on highly loaded system where the latency to re-offload a flow via 'add'
      workqueue can be quite high.
      
      When considering a flow for teardown as outdated verify that it is both
      offloaded to hardware and doesn't have any pending updates.
      
      Fixes: 41f2c7c3 ("net/sched: act_ct: Fix promotion of offloaded unreplied tuple")
      Reviewed-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a63b6622
    • Pablo Neira Ayuso's avatar
      netfilter: flowtable: GC pushes back packets to classic path · 735795f6
      Pablo Neira Ayuso authored
      Since 41f2c7c3 ("net/sched: act_ct: Fix promotion of offloaded
      unreplied tuple"), flowtable GC pushes back flows with IPS_SEEN_REPLY
      back to classic path in every run, ie. every second. This is because of
      a new check for NF_FLOW_HW_ESTABLISHED which is specific of sched/act_ct.
      
      In Netfilter's flowtable case, NF_FLOW_HW_ESTABLISHED never gets set on
      and IPS_SEEN_REPLY is unreliable since users decide when to offload the
      flow before, such bit might be set on at a later stage.
      
      Fix it by adding a custom .gc handler that sched/act_ct can use to
      deal with its NF_FLOW_HW_ESTABLISHED bit.
      
      Fixes: 41f2c7c3 ("net/sched: act_ct: Fix promotion of offloaded unreplied tuple")
      Reported-by: default avatarVladimir Smelhaus <vl.sm@email.cz>
      Reviewed-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      735795f6
  2. 22 Oct, 2023 13 commits
    • Fred Chen's avatar
      tcp: fix wrong RTO timeout when received SACK reneging · d2a0fc37
      Fred Chen authored
      This commit fix wrong RTO timeout when received SACK reneging.
      
      When an ACK arrived pointing to a SACK reneging, tcp_check_sack_reneging()
      will rearm the RTO timer for min(1/2*srtt, 10ms) into to the future.
      
      But since the commit 62d9f1a6 ("tcp: fix TLP timer not set when
      CA_STATE changes from DISORDER to OPEN") merged, the tcp_set_xmit_timer()
      is moved after tcp_fastretrans_alert()(which do the SACK reneging check),
      so the RTO timeout will be overwrited by tcp_set_xmit_timer() with
      icsk_rto instead of 1/2*srtt.
      
      Here is a packetdrill script to check this bug:
      0     socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +0    bind(3, ..., ...) = 0
      +0    listen(3, 1) = 0
      
      // simulate srtt to 100ms
      +0    < S 0:0(0) win 32792 <mss 1000, sackOK,nop,nop,nop,wscale 7>
      +0    > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      +.1    < . 1:1(0) ack 1 win 1024
      
      +0    accept(3, ..., ...) = 4
      
      +0    write(4, ..., 10000) = 10000
      +0    > P. 1:10001(10000) ack 1
      
      // inject sack
      +.1    < . 1:1(0) ack 1 win 257 <sack 1001:10001,nop,nop>
      +0    > . 1:1001(1000) ack 1
      
      // inject sack reneging
      +.1    < . 1:1(0) ack 1001 win 257 <sack 9001:10001,nop,nop>
      
      // we expect rto fired in 1/2*srtt (50ms)
      +.05    > . 1001:2001(1000) ack 1
      
      This fix remove the FLAG_SET_XMIT_TIMER from ack_flag when
      tcp_check_sack_reneging() set RTO timer with 1/2*srtt to avoid
      being overwrited later.
      
      Fixes: 62d9f1a6 ("tcp: fix TLP timer not set when CA_STATE changes from DISORDER to OPEN")
      Signed-off-by: default avatarFred Chen <fred.chenchen03@gmail.com>
      Reviewed-by: default avatarNeal Cardwell <ncardwell@google.com>
      Tested-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2a0fc37
    • David S. Miller's avatar
      Merge branch 'r8152-reg-garbage' · a40614fe
      David S. Miller authored
      Douglas Anderson says:
      
      ====================
      r8152: Avoid writing garbage to the adapter's registers
      
      This series is the result of a cooperative debug effort between
      Realtek and the ChromeOS team. On ChromeOS, we've noticed that Realtek
      Ethernet adapters can sometimes get so wedged that even a reboot of
      the host can't get them to enumerate again, assuming that the adapter
      was on a powered hub and din't lose power when the host rebooted. This
      is sometimes seen in the ChromeOS automated testing lab. The only way
      to recover adapters in this state is to manually power cycle them.
      
      I managed to reproduce one instance of this wedging (unknown if this
      is truly related to what the test lab sees) by doing this:
      1. Start a flood ping from a host to the device.
      2. Drop the device into kdb.
      3. Wait 90 seconds.
      4. Resume from kdb (the "g" command).
      5. Wait another 45 seconds.
      
      Upon analysis, Realtek realized this was happening:
      
      1. The Linux driver was getting a "Tx timeout" after resuming from kdb
         and then trying to reset itself.
      2. As part of the reset, the Linux driver was attempting to do a
         read-modify-write of the adapter's registers.
      3. The read would fail (due to a timeout) and the driver pretended
         that the register contained all 0xFFs. See commit f53a7ad1
         ("r8152: Set memory to all 0xFFs on failed reg reads")
      4. The driver would take this value of all 0xFFs, modify it, and
         attempt to write it back to the adapter.
      5. By this time the USB channel seemed to recover and thus we'd
         successfully write a value that was mostly 0xFFs to the adpater.
      6. The adapter didn't like this and would wedge itself.
      
      Another Engineer also managed to reproduce wedging of the Realtek
      Ethernet adpater during a reboot test on an AMD Chromebook. In that
      case he was sometimes seeing -EPIPE returned from the control
      transfers.
      
      This patch series fixes both issues.
      
      Changes in v5:
      - ("Run the unload routine if we have errors during probe") new for v5.
      - ("Cancel hw_phy_work if we have an error in probe") new for v5.
      - ("Release firmware if we have an error in probe") new for v5.
      - Removed extra mutex_unlock() left over in v4.
      - Fixed minor typos.
      - Don't do queue an unbind/bind reset if probe fails; just retry probe.
      
      Changes in v4:
      - Took out some unnecessary locks/unlocks of the control mutex.
      - Added comment about reading version causing probe fail if 3 fails.
      - Added text to commit msg about the potential unbind/bind loop.
      
      Changes in v3:
      - Fixed v2 changelog ending up in the commit message.
      - farmework -> framework in comments.
      
      Changes in v2:
      - ("Check for unplug in rtl_phy_patch_request()") new for v2.
      - ("Check for unplug in r8153b_ups_en() / r8153c_ups_en()") new for v2.
      - ("Rename RTL8152_UNPLUG to RTL8152_INACCESSIBLE") new for v2.
      - Reset patch no longer based on retry patch, since that was dropped.
      - Reset patch should be robust even if failures happen in probe.
      - Switched booleans to bits in the "flags" variable.
      - Check for -ENODEV instead of "udev->state == USB_STATE_NOTATTACHED"
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a40614fe
    • Douglas Anderson's avatar
      r8152: Block future register access if register access fails · d9962b0d
      Douglas Anderson authored
      Even though the functions to read/write registers can fail, most of
      the places in the r8152 driver that read/write register values don't
      check error codes. The lack of error code checking is problematic in
      at least two ways.
      
      The first problem is that the r8152 driver often uses code patterns
      similar to this:
        x = read_register()
        x = x | SOME_BIT;
        write_register(x);
      
      ...with the above pattern, if the read_register() fails and returns
      garbage then we'll end up trying to write modified garbage back to the
      Realtek adapter. If the write_register() succeeds that's bad. Note
      that as of commit f53a7ad1 ("r8152: Set memory to all 0xFFs on
      failed reg reads") the "garbage" returned by read_register() will at
      least be consistent garbage, but it is still garbage.
      
      It turns out that this problem is very serious. Writing garbage to
      some of the hardware registers on the Ethernet adapter can put the
      adapter in such a bad state that it needs to be power cycled (fully
      unplugged and plugged in again) before it can enumerate again.
      
      The second problem is that the r8152 driver generally has functions
      that are long sequences of register writes. Assuming everything will
      be OK if a random register write fails in the middle isn't a great
      assumption.
      
      One might wonder if the above two problems are real. You could ask if
      we would really have a successful write after a failed read. It turns
      out that the answer appears to be "yes, this can happen". In fact,
      we've seen at least two distinct failure modes where this happens.
      
      On a sc7180-trogdor Chromebook if you drop into kdb for a while and
      then resume, you can see:
      1. We get a "Tx timeout"
      2. The "Tx timeout" queues up a USB reset.
      3. In rtl8152_pre_reset() we try to reinit the hardware.
      4. The first several (2-9) register accesses fail with a timeout, then
         things recover.
      
      The above test case was actually fixed by the patch ("r8152: Increase
      USB control msg timeout to 5000ms as per spec") but at least shows
      that we really can see successful calls after failed ones.
      
      On a different (AMD) based Chromebook with a particular adapter, we
      found that during reboot tests we'd also sometimes get a transitory
      failure. In this case we saw -EPIPE being returned sometimes. Retrying
      worked, but retrying is not always safe for all register accesses
      since reading/writing some registers might have side effects (like
      registers that clear on read).
      
      Let's fully lock out all register access if a register access fails.
      When we do this, we'll try to queue up a USB reset and try to unlock
      register access after the reset. This is slightly tricker than it
      sounds since the r8152 driver has an optimized reset sequence that
      only works reliably after probe happens. In order to handle this, we
      avoid the optimized reset if probe didn't finish. Instead, we simply
      retry the probe routine in this case.
      
      When locking out access, we'll use the existing infrastructure that
      the driver was using when it detected we were unplugged. This keeps us
      from getting stuck in delay loops in some parts of the driver.
      Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Reviewed-by: default avatarGrant Grundler <grundler@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d9962b0d
    • Douglas Anderson's avatar
      r8152: Rename RTL8152_UNPLUG to RTL8152_INACCESSIBLE · 715f67f3
      Douglas Anderson authored
      Whenever the RTL8152_UNPLUG is set that just tells the driver that all
      accesses will fail and we should just immediately bail. A future patch
      will use this same concept at a time when the driver hasn't actually
      been unplugged but is about to be reset. Rename the flag in
      preparation for the future patch.
      
      This is a no-op change and just a search and replace.
      Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Reviewed-by: default avatarGrant Grundler <grundler@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      715f67f3
    • Douglas Anderson's avatar
      r8152: Check for unplug in r8153b_ups_en() / r8153c_ups_en() · bc65cc42
      Douglas Anderson authored
      If the adapter is unplugged while we're looping in r8153b_ups_en() /
      r8153c_ups_en() we could end up looping for 10 seconds (20 ms * 500
      loops). Add code similar to what's done in other places in the driver
      to check for unplug and bail.
      Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Reviewed-by: default avatarGrant Grundler <grundler@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bc65cc42
    • Douglas Anderson's avatar
      r8152: Check for unplug in rtl_phy_patch_request() · dc90ba37
      Douglas Anderson authored
      If the adapter is unplugged while we're looping in
      rtl_phy_patch_request() we could end up looping for 10 seconds (2 ms *
      5000 loops). Add code similar to what's done in other places in the
      driver to check for unplug and bail.
      Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Reviewed-by: default avatarGrant Grundler <grundler@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc90ba37
    • Douglas Anderson's avatar
      r8152: Release firmware if we have an error in probe · b8d35024
      Douglas Anderson authored
      The error handling in rtl8152_probe() is missing a call to release
      firmware. Add it in to match what's in the cleanup code in
      rtl8152_disconnect().
      
      Fixes: 9370f2d0 ("r8152: support request_firmware for RTL8153")
      Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Reviewed-by: default avatarGrant Grundler <grundler@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b8d35024
    • Douglas Anderson's avatar
      r8152: Cancel hw_phy_work if we have an error in probe · bb8adff9
      Douglas Anderson authored
      The error handling in rtl8152_probe() is missing a call to cancel the
      hw_phy_work. Add it in to match what's in the cleanup code in
      rtl8152_disconnect().
      
      Fixes: a028a9e0 ("r8152: move the settings of PHY to a work queue")
      Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Reviewed-by: default avatarGrant Grundler <grundler@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb8adff9
    • Douglas Anderson's avatar
      r8152: Run the unload routine if we have errors during probe · 5dd17689
      Douglas Anderson authored
      The rtl8152_probe() function lacks a call to the chip-specific
      unload() routine when it sees an error in probe. Add it in to match
      the cleanup code in rtl8152_disconnect().
      
      Fixes: ac718b69 ("net/usb: new driver for RTL8152")
      Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Reviewed-by: default avatarGrant Grundler <grundler@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5dd17689
    • Douglas Anderson's avatar
      r8152: Increase USB control msg timeout to 5000ms as per spec · a5feba71
      Douglas Anderson authored
      According to the comment next to USB_CTRL_GET_TIMEOUT and
      USB_CTRL_SET_TIMEOUT, although sending/receiving control messages is
      usually quite fast, the spec allows them to take up to 5 seconds.
      Let's increase the timeout in the Realtek driver from 500ms to 5000ms
      (using the #defines) to account for this.
      
      This is not just a theoretical change. The need for the longer timeout
      was seen in testing. Specifically, if you drop a sc7180-trogdor based
      Chromebook into the kdb debugger and then "go" again after sitting in
      the debugger for a while, the next USB control message takes a long
      time. Out of ~40 tests the slowest USB control message was 4.5
      seconds.
      
      While dropping into kdb is not exactly an end-user scenario, the above
      is similar to what could happen due to an temporary interrupt storm,
      what could happen if there was a host controller (HW or SW) issue, or
      what could happen if the Realtek device got into a confused state and
      needed time to recover.
      
      This change is fairly critical since the r8152 driver in Linux doesn't
      expect register reads/writes (which are backed by USB control
      messages) to fail.
      
      Fixes: ac718b69 ("net/usb: new driver for RTL8152")
      Suggested-by: default avatarHayes Wang <hayeswang@realtek.com>
      Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Reviewed-by: default avatarGrant Grundler <grundler@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a5feba71
    • Shigeru Yoshida's avatar
      net: usb: smsc95xx: Fix uninit-value access in smsc95xx_read_reg · 51a32e82
      Shigeru Yoshida authored
      syzbot reported the following uninit-value access issue [1]:
      
      smsc95xx 1-1:0.0 (unnamed net_device) (uninitialized): Failed to read reg index 0x00000030: -32
      smsc95xx 1-1:0.0 (unnamed net_device) (uninitialized): Error reading E2P_CMD
      =====================================================
      BUG: KMSAN: uninit-value in smsc95xx_reset+0x409/0x25f0 drivers/net/usb/smsc95xx.c:896
       smsc95xx_reset+0x409/0x25f0 drivers/net/usb/smsc95xx.c:896
       smsc95xx_bind+0x9bc/0x22e0 drivers/net/usb/smsc95xx.c:1131
       usbnet_probe+0x100b/0x4060 drivers/net/usb/usbnet.c:1750
       usb_probe_interface+0xc75/0x1210 drivers/usb/core/driver.c:396
       really_probe+0x506/0xf40 drivers/base/dd.c:658
       __driver_probe_device+0x2a7/0x5d0 drivers/base/dd.c:800
       driver_probe_device+0x72/0x7b0 drivers/base/dd.c:830
       __device_attach_driver+0x55a/0x8f0 drivers/base/dd.c:958
       bus_for_each_drv+0x3ff/0x620 drivers/base/bus.c:457
       __device_attach+0x3bd/0x640 drivers/base/dd.c:1030
       device_initial_probe+0x32/0x40 drivers/base/dd.c:1079
       bus_probe_device+0x3d8/0x5a0 drivers/base/bus.c:532
       device_add+0x16ae/0x1f20 drivers/base/core.c:3622
       usb_set_configuration+0x31c9/0x38c0 drivers/usb/core/message.c:2207
       usb_generic_driver_probe+0x109/0x2a0 drivers/usb/core/generic.c:238
       usb_probe_device+0x290/0x4a0 drivers/usb/core/driver.c:293
       really_probe+0x506/0xf40 drivers/base/dd.c:658
       __driver_probe_device+0x2a7/0x5d0 drivers/base/dd.c:800
       driver_probe_device+0x72/0x7b0 drivers/base/dd.c:830
       __device_attach_driver+0x55a/0x8f0 drivers/base/dd.c:958
       bus_for_each_drv+0x3ff/0x620 drivers/base/bus.c:457
       __device_attach+0x3bd/0x640 drivers/base/dd.c:1030
       device_initial_probe+0x32/0x40 drivers/base/dd.c:1079
       bus_probe_device+0x3d8/0x5a0 drivers/base/bus.c:532
       device_add+0x16ae/0x1f20 drivers/base/core.c:3622
       usb_new_device+0x15f6/0x22f0 drivers/usb/core/hub.c:2589
       hub_port_connect drivers/usb/core/hub.c:5440 [inline]
       hub_port_connect_change drivers/usb/core/hub.c:5580 [inline]
       port_event drivers/usb/core/hub.c:5740 [inline]
       hub_event+0x53bc/0x7290 drivers/usb/core/hub.c:5822
       process_one_work kernel/workqueue.c:2630 [inline]
       process_scheduled_works+0x104e/0x1e70 kernel/workqueue.c:2703
       worker_thread+0xf45/0x1490 kernel/workqueue.c:2784
       kthread+0x3e8/0x540 kernel/kthread.c:388
       ret_from_fork+0x66/0x80 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
      
      Local variable buf.i225 created at:
       smsc95xx_read_reg drivers/net/usb/smsc95xx.c:90 [inline]
       smsc95xx_reset+0x203/0x25f0 drivers/net/usb/smsc95xx.c:892
       smsc95xx_bind+0x9bc/0x22e0 drivers/net/usb/smsc95xx.c:1131
      
      CPU: 1 PID: 773 Comm: kworker/1:2 Not tainted 6.6.0-rc1-syzkaller-00125-ge42bebf6 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023
      Workqueue: usb_hub_wq hub_event
      =====================================================
      
      Similar to e9c65989 ("net: usb: smsc75xx: Fix uninit-value access in
      __smsc75xx_read_reg"), this issue is caused because usbnet_read_cmd() reads
      less bytes than requested (zero byte in the reproducer). In this case,
      'buf' is not properly filled.
      
      This patch fixes the issue by returning -ENODATA if usbnet_read_cmd() reads
      less bytes than requested.
      
      sysbot reported similar uninit-value access issue [2]. The root cause is
      the same as mentioned above, and this patch addresses it as well.
      
      Fixes: 2f7ca802 ("net: Add SMSC LAN9500 USB2.0 10/100 ethernet adapter driver")
      Reported-and-tested-by: syzbot+c74c24b43c9ae534f0e0@syzkaller.appspotmail.com
      Reported-and-tested-by: syzbot+2c97a98a5ba9ea9c23bd@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=c74c24b43c9ae534f0e0 [1]
      Closes: https://syzkaller.appspot.com/bug?extid=2c97a98a5ba9ea9c23bd [2]
      Signed-off-by: default avatarShigeru Yoshida <syoshida@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      51a32e82
    • Su Hui's avatar
      net: chelsio: cxgb4: add an error code check in t4_load_phy_fw · 9f771493
      Su Hui authored
      t4_set_params_timeout() can return -EINVAL if failed, add check
      for this.
      Signed-off-by: default avatarSu Hui <suhui@nfschina.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9f771493
    • Christophe JAILLET's avatar
      net: ieee802154: adf7242: Fix some potential buffer overflow in adf7242_stats_show() · ca082f01
      Christophe JAILLET authored
      strncat() usage in adf7242_debugfs_init() is wrong.
      The size given to strncat() is the maximum number of bytes that can be
      written, excluding the trailing NULL.
      
      Here, the size that is passed, DNAME_INLINE_LEN, does not take into account
      the size of "adf7242-" that is already in the array.
      
      In order to fix it, use snprintf() instead.
      
      Fixes: 7302b9d9 ("ieee802154/adf7242: Driver for ADF7242 MAC IEEE802154")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca082f01
  3. 21 Oct, 2023 7 commits
  4. 20 Oct, 2023 8 commits
    • Mateusz Palczewski's avatar
      igb: Fix potential memory leak in igb_add_ethtool_nfc_entry · 8c0b48e0
      Mateusz Palczewski authored
      Add check for return of igb_update_ethtool_nfc_entry so that in case
      of any potential errors the memory alocated for input will be freed.
      
      Fixes: 0e71def2 ("igb: add support of RX network flow classification")
      Reviewed-by: default avatarWojciech Drewek <wojciech.drewek@intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8c0b48e0
    • Kunwu Chan's avatar
      treewide: Spelling fix in comment · fb71ba0e
      Kunwu Chan authored
      reques -> request
      
      Fixes: 09dde54c ("PS3: gelic: Add wireless support for PS3")
      Signed-off-by: default avatarKunwu Chan <chentao@kylinos.cn>
      Reviewed-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb71ba0e
    • Ivan Vecera's avatar
      i40e: Fix I40E_FLAG_VF_VLAN_PRUNING value · 665e7d83
      Ivan Vecera authored
      Commit c87c938f ("i40e: Add VF VLAN pruning") added new
      PF flag I40E_FLAG_VF_VLAN_PRUNING but its value collides with
      existing I40E_FLAG_TOTAL_PORT_SHUTDOWN_ENABLED flag.
      
      Move the affected flag at the end of the flags and fix its value.
      
      Reproducer:
      [root@cnb-03 ~]# ethtool --set-priv-flags enp2s0f0np0 link-down-on-close on
      [root@cnb-03 ~]# ethtool --set-priv-flags enp2s0f0np0 vf-vlan-pruning on
      [root@cnb-03 ~]# ethtool --set-priv-flags enp2s0f0np0 link-down-on-close off
      [ 6323.142585] i40e 0000:02:00.0: Setting link-down-on-close not supported on this port (because total-port-shutdown is enabled)
      netlink error: Operation not supported
      [root@cnb-03 ~]# ethtool --set-priv-flags enp2s0f0np0 vf-vlan-pruning off
      [root@cnb-03 ~]# ethtool --set-priv-flags enp2s0f0np0 link-down-on-close off
      
      The link-down-on-close flag cannot be modified after setting vf-vlan-pruning
      because vf-vlan-pruning shares the same bit with total-port-shutdown flag
      that prevents any modification of link-down-on-close flag.
      
      Fixes: c87c938f ("i40e: Add VF VLAN pruning")
      Cc: Mateusz Palczewski <mateusz.palczewski@intel.com>
      Cc: Simon Horman <horms@kernel.org>
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      665e7d83
    • Michal Schmidt's avatar
      iavf: initialize waitqueues before starting watchdog_task · 7db31110
      Michal Schmidt authored
      It is not safe to initialize the waitqueues after queueing the
      watchdog_task. It will be using them.
      
      The chance of this causing a real problem is very small, because
      there will be some sleeping before any of the waitqueues get used.
      I got a crash only after inserting an artificial sleep in iavf_probe.
      
      Queue the watchdog_task as the last step in iavf_probe. Add a comment to
      prevent repeating the mistake.
      
      Fixes: fe2647ab ("i40evf: prevent VF close returning before state transitions to DOWN")
      Signed-off-by: default avatarMichal Schmidt <mschmidt@redhat.com>
      Reviewed-by: default avatarPaul Menzel <pmenzel@molgen.mpg.de>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7db31110
    • Mirsad Goran Todorovac's avatar
      r8169: fix the KCSAN reported data race in rtl_rx while reading desc->opts1 · f97eee48
      Mirsad Goran Todorovac authored
      KCSAN reported the following data-race bug:
      
      ==================================================================
      BUG: KCSAN: data-race in rtl8169_poll (drivers/net/ethernet/realtek/r8169_main.c:4430 drivers/net/ethernet/realtek/r8169_main.c:4583) r8169
      
      race at unknown origin, with read to 0xffff888117e43510 of 4 bytes by interrupt on cpu 21:
      rtl8169_poll (drivers/net/ethernet/realtek/r8169_main.c:4430 drivers/net/ethernet/realtek/r8169_main.c:4583) r8169
      __napi_poll (net/core/dev.c:6527)
      net_rx_action (net/core/dev.c:6596 net/core/dev.c:6727)
      __do_softirq (kernel/softirq.c:553)
      __irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632)
      irq_exit_rcu (kernel/softirq.c:647)
      sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1074 (discriminator 14))
      asm_sysvec_apic_timer_interrupt (./arch/x86/include/asm/idtentry.h:645)
      cpuidle_enter_state (drivers/cpuidle/cpuidle.c:291)
      cpuidle_enter (drivers/cpuidle/cpuidle.c:390)
      call_cpuidle (kernel/sched/idle.c:135)
      do_idle (kernel/sched/idle.c:219 kernel/sched/idle.c:282)
      cpu_startup_entry (kernel/sched/idle.c:378 (discriminator 1))
      start_secondary (arch/x86/kernel/smpboot.c:210 arch/x86/kernel/smpboot.c:294)
      secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:433)
      
      value changed: 0x80003fff -> 0x3402805f
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 21 PID: 0 Comm: swapper/21 Tainted: G             L     6.6.0-rc2-kcsan-00143-gb5cbe7c0 #41
      Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
      ==================================================================
      
      drivers/net/ethernet/realtek/r8169_main.c:
      ==========================================
         4429
       → 4430                 status = le32_to_cpu(desc->opts1);
         4431                 if (status & DescOwn)
         4432                         break;
         4433
         4434                 /* This barrier is needed to keep us from reading
         4435                  * any other fields out of the Rx descriptor until
         4436                  * we know the status of DescOwn
         4437                  */
         4438                 dma_rmb();
         4439
         4440                 if (unlikely(status & RxRES)) {
         4441                         if (net_ratelimit())
         4442                                 netdev_warn(dev, "Rx ERROR. status = %08x\n",
      
      Marco Elver explained that dma_rmb() doesn't prevent the compiler to tear up the access to
      desc->opts1 which can be written to concurrently. READ_ONCE() should prevent that from
      happening:
      
         4429
       → 4430                 status = le32_to_cpu(READ_ONCE(desc->opts1));
         4431                 if (status & DescOwn)
         4432                         break;
         4433
      
      As the consequence of this fix, this KCSAN warning was eliminated.
      
      Fixes: 6202806e ("r8169: drop member opts1_mask from struct rtl8169_private")
      Suggested-by: default avatarMarco Elver <elver@google.com>
      Cc: Heiner Kallweit <hkallweit1@gmail.com>
      Cc: nic_swsd@realtek.com
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: netdev@vger.kernel.org
      Link: https://lore.kernel.org/lkml/dc7fc8fa-4ea4-e9a9-30a6-7c83e6b53188@alu.unizg.hr/Signed-off-by: default avatarMirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
      Acked-by: default avatarMarco Elver <elver@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f97eee48
    • Mirsad Goran Todorovac's avatar
      r8169: fix the KCSAN reported data-race in rtl_tx while reading TxDescArray[entry].opts1 · dcf75a0f
      Mirsad Goran Todorovac authored
      KCSAN reported the following data-race:
      
      ==================================================================
      BUG: KCSAN: data-race in rtl8169_poll (drivers/net/ethernet/realtek/r8169_main.c:4368 drivers/net/ethernet/realtek/r8169_main.c:4581) r8169
      
      race at unknown origin, with read to 0xffff888140d37570 of 4 bytes by interrupt on cpu 21:
      rtl8169_poll (drivers/net/ethernet/realtek/r8169_main.c:4368 drivers/net/ethernet/realtek/r8169_main.c:4581) r8169
      __napi_poll (net/core/dev.c:6527)
      net_rx_action (net/core/dev.c:6596 net/core/dev.c:6727)
      __do_softirq (kernel/softirq.c:553)
      __irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632)
      irq_exit_rcu (kernel/softirq.c:647)
      sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1074 (discriminator 14))
      asm_sysvec_apic_timer_interrupt (./arch/x86/include/asm/idtentry.h:645)
      cpuidle_enter_state (drivers/cpuidle/cpuidle.c:291)
      cpuidle_enter (drivers/cpuidle/cpuidle.c:390)
      call_cpuidle (kernel/sched/idle.c:135)
      do_idle (kernel/sched/idle.c:219 kernel/sched/idle.c:282)
      cpu_startup_entry (kernel/sched/idle.c:378 (discriminator 1))
      start_secondary (arch/x86/kernel/smpboot.c:210 arch/x86/kernel/smpboot.c:294)
      secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:433)
      
      value changed: 0xb0000042 -> 0x00000000
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 21 PID: 0 Comm: swapper/21 Tainted: G             L     6.6.0-rc2-kcsan-00143-gb5cbe7c0 #41
      Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
      ==================================================================
      
      The read side is in
      
      drivers/net/ethernet/realtek/r8169_main.c
      =========================================
         4355 static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp,
         4356                    int budget)
         4357 {
         4358         unsigned int dirty_tx, bytes_compl = 0, pkts_compl = 0;
         4359         struct sk_buff *skb;
         4360
         4361         dirty_tx = tp->dirty_tx;
         4362
         4363         while (READ_ONCE(tp->cur_tx) != dirty_tx) {
         4364                 unsigned int entry = dirty_tx % NUM_TX_DESC;
         4365                 u32 status;
         4366
       → 4367                 status = le32_to_cpu(tp->TxDescArray[entry].opts1);
         4368                 if (status & DescOwn)
         4369                         break;
         4370
         4371                 skb = tp->tx_skb[entry].skb;
         4372                 rtl8169_unmap_tx_skb(tp, entry);
         4373
         4374                 if (skb) {
         4375                         pkts_compl++;
         4376                         bytes_compl += skb->len;
         4377                         napi_consume_skb(skb, budget);
         4378                 }
         4379                 dirty_tx++;
         4380         }
         4381
         4382         if (tp->dirty_tx != dirty_tx) {
         4383                 dev_sw_netstats_tx_add(dev, pkts_compl, bytes_compl);
         4384                 WRITE_ONCE(tp->dirty_tx, dirty_tx);
         4385
         4386                 netif_subqueue_completed_wake(dev, 0, pkts_compl, bytes_compl,
         4387                                               rtl_tx_slots_avail(tp),
         4388                                               R8169_TX_START_THRS);
         4389                 /*
         4390                  * 8168 hack: TxPoll requests are lost when the Tx packets are
         4391                  * too close. Let's kick an extra TxPoll request when a burst
         4392                  * of start_xmit activity is detected (if it is not detected,
         4393                  * it is slow enough). -- FR
         4394                  * If skb is NULL then we come here again once a tx irq is
         4395                  * triggered after the last fragment is marked transmitted.
         4396                  */
         4397                 if (READ_ONCE(tp->cur_tx) != dirty_tx && skb)
         4398                         rtl8169_doorbell(tp);
         4399         }
         4400 }
      
      tp->TxDescArray[entry].opts1 is reported to have a data-race and READ_ONCE() fixes
      this KCSAN warning.
      
         4366
       → 4367                 status = le32_to_cpu(READ_ONCE(tp->TxDescArray[entry].opts1));
         4368                 if (status & DescOwn)
         4369                         break;
         4370
      
      Cc: Heiner Kallweit <hkallweit1@gmail.com>
      Cc: nic_swsd@realtek.com
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Marco Elver <elver@google.com>
      Cc: netdev@vger.kernel.org
      Link: https://lore.kernel.org/lkml/dc7fc8fa-4ea4-e9a9-30a6-7c83e6b53188@alu.unizg.hr/Signed-off-by: default avatarMirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
      Acked-by: default avatarMarco Elver <elver@google.com>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dcf75a0f
    • Mirsad Goran Todorovac's avatar
      r8169: fix the KCSAN reported data-race in rtl_tx() while reading tp->cur_tx · c1c0ce31
      Mirsad Goran Todorovac authored
      KCSAN reported the following data-race:
      
      ==================================================================
      BUG: KCSAN: data-race in rtl8169_poll [r8169] / rtl8169_start_xmit [r8169]
      
      write (marked) to 0xffff888102474b74 of 4 bytes by task 5358 on cpu 29:
      rtl8169_start_xmit (drivers/net/ethernet/realtek/r8169_main.c:4254) r8169
      dev_hard_start_xmit (./include/linux/netdevice.h:4889 ./include/linux/netdevice.h:4903 net/core/dev.c:3544 net/core/dev.c:3560)
      sch_direct_xmit (net/sched/sch_generic.c:342)
      __dev_queue_xmit (net/core/dev.c:3817 net/core/dev.c:4306)
      ip_finish_output2 (./include/linux/netdevice.h:3082 ./include/net/neighbour.h:526 ./include/net/neighbour.h:540 net/ipv4/ip_output.c:233)
      __ip_finish_output (net/ipv4/ip_output.c:311 net/ipv4/ip_output.c:293)
      ip_finish_output (net/ipv4/ip_output.c:328)
      ip_output (net/ipv4/ip_output.c:435)
      ip_send_skb (./include/net/dst.h:458 net/ipv4/ip_output.c:127 net/ipv4/ip_output.c:1486)
      udp_send_skb (net/ipv4/udp.c:963)
      udp_sendmsg (net/ipv4/udp.c:1246)
      inet_sendmsg (net/ipv4/af_inet.c:840 (discriminator 4))
      sock_sendmsg (net/socket.c:730 net/socket.c:753)
      __sys_sendto (net/socket.c:2177)
      __x64_sys_sendto (net/socket.c:2185)
      do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
      entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
      
      read to 0xffff888102474b74 of 4 bytes by interrupt on cpu 21:
      rtl8169_poll (drivers/net/ethernet/realtek/r8169_main.c:4397 drivers/net/ethernet/realtek/r8169_main.c:4581) r8169
      __napi_poll (net/core/dev.c:6527)
      net_rx_action (net/core/dev.c:6596 net/core/dev.c:6727)
      __do_softirq (kernel/softirq.c:553)
      __irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632)
      irq_exit_rcu (kernel/softirq.c:647)
      common_interrupt (arch/x86/kernel/irq.c:247 (discriminator 14))
      asm_common_interrupt (./arch/x86/include/asm/idtentry.h:636)
      cpuidle_enter_state (drivers/cpuidle/cpuidle.c:291)
      cpuidle_enter (drivers/cpuidle/cpuidle.c:390)
      call_cpuidle (kernel/sched/idle.c:135)
      do_idle (kernel/sched/idle.c:219 kernel/sched/idle.c:282)
      cpu_startup_entry (kernel/sched/idle.c:378 (discriminator 1))
      start_secondary (arch/x86/kernel/smpboot.c:210 arch/x86/kernel/smpboot.c:294)
      secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:433)
      
      value changed: 0x002f4815 -> 0x002f4816
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 21 PID: 0 Comm: swapper/21 Tainted: G             L     6.6.0-rc2-kcsan-00143-gb5cbe7c0 #41
      Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
      ==================================================================
      
      The write side of drivers/net/ethernet/realtek/r8169_main.c is:
      ==================
         4251         /* rtl_tx needs to see descriptor changes before updated tp->cur_tx */
         4252         smp_wmb();
         4253
       → 4254         WRITE_ONCE(tp->cur_tx, tp->cur_tx + frags + 1);
         4255
         4256         stop_queue = !netif_subqueue_maybe_stop(dev, 0, rtl_tx_slots_avail(tp),
         4257                                                 R8169_TX_STOP_THRS,
         4258                                                 R8169_TX_START_THRS);
      
      The read side is the function rtl_tx():
      
         4355 static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp,
         4356                    int budget)
         4357 {
         4358         unsigned int dirty_tx, bytes_compl = 0, pkts_compl = 0;
         4359         struct sk_buff *skb;
         4360
         4361         dirty_tx = tp->dirty_tx;
         4362
         4363         while (READ_ONCE(tp->cur_tx) != dirty_tx) {
         4364                 unsigned int entry = dirty_tx % NUM_TX_DESC;
         4365                 u32 status;
         4366
         4367                 status = le32_to_cpu(tp->TxDescArray[entry].opts1);
         4368                 if (status & DescOwn)
         4369                         break;
         4370
         4371                 skb = tp->tx_skb[entry].skb;
         4372                 rtl8169_unmap_tx_skb(tp, entry);
         4373
         4374                 if (skb) {
         4375                         pkts_compl++;
         4376                         bytes_compl += skb->len;
         4377                         napi_consume_skb(skb, budget);
         4378                 }
         4379                 dirty_tx++;
         4380         }
         4381
         4382         if (tp->dirty_tx != dirty_tx) {
         4383                 dev_sw_netstats_tx_add(dev, pkts_compl, bytes_compl);
         4384                 WRITE_ONCE(tp->dirty_tx, dirty_tx);
         4385
         4386                 netif_subqueue_completed_wake(dev, 0, pkts_compl, bytes_compl,
         4387                                               rtl_tx_slots_avail(tp),
         4388                                               R8169_TX_START_THRS);
         4389                 /*
         4390                  * 8168 hack: TxPoll requests are lost when the Tx packets are
         4391                  * too close. Let's kick an extra TxPoll request when a burst
         4392                  * of start_xmit activity is detected (if it is not detected,
         4393                  * it is slow enough). -- FR
         4394                  * If skb is NULL then we come here again once a tx irq is
         4395                  * triggered after the last fragment is marked transmitted.
         4396                  */
       → 4397                 if (tp->cur_tx != dirty_tx && skb)
         4398                         rtl8169_doorbell(tp);
         4399         }
         4400 }
      
      Obviously from the code, an earlier detected data-race for tp->cur_tx was fixed in the
      line 4363:
      
         4363         while (READ_ONCE(tp->cur_tx) != dirty_tx) {
      
      but the same solution is required for protecting the other access to tp->cur_tx:
      
       → 4397                 if (READ_ONCE(tp->cur_tx) != dirty_tx && skb)
         4398                         rtl8169_doorbell(tp);
      
      The write in the line 4254 is protected with WRITE_ONCE(), but the read in the line 4397
      might have suffered read tearing under some compiler optimisations.
      
      The fix eliminated the KCSAN data-race report for this bug.
      
      It is yet to be evaluated what happens if tp->cur_tx changes between the test in line 4363
      and line 4397. This test should certainly not be cached by the compiler in some register
      for such a long time, while asynchronous writes to tp->cur_tx might have occurred in line
      4254 in the meantime.
      
      Fixes: 94d8a98e ("r8169: reduce number of workaround doorbell rings")
      Cc: Heiner Kallweit <hkallweit1@gmail.com>
      Cc: nic_swsd@realtek.com
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Marco Elver <elver@google.com>
      Cc: netdev@vger.kernel.org
      Link: https://lore.kernel.org/lkml/dc7fc8fa-4ea4-e9a9-30a6-7c83e6b53188@alu.unizg.hr/Signed-off-by: default avatarMirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
      Acked-by: default avatarMarco Elver <elver@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1c0ce31
    • Maciej Fijalkowski's avatar
      i40e: xsk: remove count_mask · 913eda2b
      Maciej Fijalkowski authored
      Cited commit introduced a neat way of updating next_to_clean that does
      not require boundary checks on each increment. This was done by masking
      the new value with (ring length - 1) mask. Problem is that this is
      applicable only for power of 2 ring sizes, for every other size this
      assumption can not be made. In turn, it leads to cleaning descriptors
      out of order as well as splats:
      
      [ 1388.411915] Workqueue: events xp_release_deferred
      [ 1388.411919] RIP: 0010:xp_free+0x1a/0x50
      [ 1388.411921] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 55 48 8b 57 70 48 8d 47 70 48 89 e5 48 39 d0 74 06 <5d> c3 cc cc cc cc 48 8b 57 60 83 82 b8 00 00 00 01 48 8b 57 60 48
      [ 1388.411922] RSP: 0018:ffa0000000a83cb0 EFLAGS: 00000206
      [ 1388.411923] RAX: ff11000119aa5030 RBX: 000000000000001d RCX: ff110001129b6e50
      [ 1388.411924] RDX: ff11000119aa4fa0 RSI: 0000000055555554 RDI: ff11000119aa4fc0
      [ 1388.411925] RBP: ffa0000000a83cb0 R08: 0000000000000000 R09: 0000000000000000
      [ 1388.411926] R10: 0000000000000001 R11: 0000000000000000 R12: ff11000115829b80
      [ 1388.411927] R13: 000000000000005f R14: 0000000000000000 R15: ff11000119aa4fc0
      [ 1388.411928] FS:  0000000000000000(0000) GS:ff11000277e00000(0000) knlGS:0000000000000000
      [ 1388.411929] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1388.411930] CR2: 00007f1f564e6c14 CR3: 000000000783c005 CR4: 0000000000771ef0
      [ 1388.411931] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 1388.411931] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 1388.411932] PKRU: 55555554
      [ 1388.411933] Call Trace:
      [ 1388.411934]  <IRQ>
      [ 1388.411935]  ? show_regs+0x6e/0x80
      [ 1388.411937]  ? watchdog_timer_fn+0x1d2/0x240
      [ 1388.411939]  ? __pfx_watchdog_timer_fn+0x10/0x10
      [ 1388.411941]  ? __hrtimer_run_queues+0x10e/0x290
      [ 1388.411945]  ? clockevents_program_event+0xae/0x130
      [ 1388.411947]  ? hrtimer_interrupt+0x105/0x240
      [ 1388.411949]  ? __sysvec_apic_timer_interrupt+0x54/0x150
      [ 1388.411952]  ? sysvec_apic_timer_interrupt+0x7f/0x90
      [ 1388.411955]  </IRQ>
      [ 1388.411955]  <TASK>
      [ 1388.411956]  ? asm_sysvec_apic_timer_interrupt+0x1f/0x30
      [ 1388.411958]  ? xp_free+0x1a/0x50
      [ 1388.411960]  i40e_xsk_clean_rx_ring+0x5d/0x100 [i40e]
      [ 1388.411968]  i40e_clean_rx_ring+0x14c/0x170 [i40e]
      [ 1388.411977]  i40e_queue_pair_disable+0xda/0x260 [i40e]
      [ 1388.411986]  i40e_xsk_pool_setup+0x192/0x1d0 [i40e]
      [ 1388.411993]  i40e_reconfig_rss_queues+0x1f0/0x1450 [i40e]
      [ 1388.412002]  xp_disable_drv_zc+0x73/0xf0
      [ 1388.412004]  ? mutex_lock+0x17/0x50
      [ 1388.412007]  xp_release_deferred+0x2b/0xc0
      [ 1388.412010]  process_one_work+0x178/0x350
      [ 1388.412011]  ? __pfx_worker_thread+0x10/0x10
      [ 1388.412012]  worker_thread+0x2f7/0x420
      [ 1388.412014]  ? __pfx_worker_thread+0x10/0x10
      [ 1388.412015]  kthread+0xf8/0x130
      [ 1388.412017]  ? __pfx_kthread+0x10/0x10
      [ 1388.412019]  ret_from_fork+0x3d/0x60
      [ 1388.412021]  ? __pfx_kthread+0x10/0x10
      [ 1388.412023]  ret_from_fork_asm+0x1b/0x30
      [ 1388.412026]  </TASK>
      
      It comes from picking wrong ring entries when cleaning xsk buffers
      during pool detach.
      
      Remove the count_mask logic and use they boundary check when updating
      next_to_process (which used to be a next_to_clean).
      
      Fixes: c8a8ca34 ("i40e: remove unnecessary memory writes of the next to clean pointer")
      Reported-by: default avatarTushar Vyavahare <tushar.vyavahare@intel.com>
      Tested-by: default avatarTushar Vyavahare <tushar.vyavahare@intel.com>
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Link: https://lore.kernel.org/r/20231018163908.40841-1-maciej.fijalkowski@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      913eda2b
  5. 19 Oct, 2023 10 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · ce55c22e
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Including fixes from bluetooth, netfilter, WiFi.
      
        Feels like an up-tick in regression fixes, mostly for older releases.
        The hfsc fix, tcp_disconnect() and Intel WWAN fixes stand out as
        fairly clear-cut user reported regressions. The mlx5 DMA bug was
        causing strife for 390x folks. The fixes themselves are not
        particularly scary, tho. No open investigations / outstanding reports
        at the time of writing.
      
        Current release - regressions:
      
         - eth: mlx5: perform DMA operations in the right locations, make
           devices usable on s390x, again
      
         - sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner
           curve, previous fix of rejecting invalid config broke some scripts
      
         - rfkill: reduce data->mtx scope in rfkill_fop_open, avoid deadlock
      
         - revert "ethtool: Fix mod state of verbose no_mask bitset", needs
           more work
      
        Current release - new code bugs:
      
         - tcp: fix listen() warning with v4-mapped-v6 address
      
        Previous releases - regressions:
      
         - tcp: allow tcp_disconnect() again when threads are waiting, it was
           denied to plug a constant source of bugs but turns out .NET depends
           on it
      
         - eth: mlx5: fix double-free if buffer refill fails under OOM
      
         - revert "net: wwan: iosm: enable runtime pm support for 7560", it's
           causing regressions and the WWAN team at Intel disappeared
      
         - tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a
           single skb, fix single-stream perf regression on some devices
      
        Previous releases - always broken:
      
         - Bluetooth:
            - fix issues in legacy BR/EDR PIN code pairing
            - correctly bounds check and pad HCI_MON_NEW_INDEX name
      
         - netfilter:
            - more fixes / follow ups for the large "commit protocol" rework,
              which went in as a fix to 6.5
            - fix null-derefs on netlink attrs which user may not pass in
      
         - tcp: fix excessive TLP and RACK timeouts from HZ rounding (bless
           Debian for keeping HZ=250 alive)
      
         - net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation, prevent
           letting frankenstein UDP super-frames from getting into the stack
      
         - net: fix interface altnames when ifc moves to a new namespace
      
         - eth: qed: fix the size of the RX buffers
      
         - mptcp: avoid sending RST when closing the initial subflow"
      
      * tag 'net-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits)
        Revert "ethtool: Fix mod state of verbose no_mask bitset"
        selftests: mptcp: join: no RST when rm subflow/addr
        mptcp: avoid sending RST when closing the initial subflow
        mptcp: more conservative check for zero probes
        tcp: check mptcp-level constraints for backlog coalescing
        selftests: mptcp: join: correctly check for no RST
        net: ti: icssg-prueth: Fix r30 CMDs bitmasks
        selftests: net: add very basic test for netdev names and namespaces
        net: move altnames together with the netdevice
        net: avoid UAF on deleted altname
        net: check for altname conflicts when changing netdev's netns
        net: fix ifname in netlink ntf during netns move
        net: ethernet: ti: Fix mixed module-builtin object
        net: phy: bcm7xxx: Add missing 16nm EPHY statistics
        ipv4: fib: annotate races around nh->nh_saddr_genid and nh->nh_saddr
        tcp_bpf: properly release resources on error paths
        net/sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner curve
        net: mdio-mux: fix C45 access returning -EIO after API change
        tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a single skb
        octeon_ep: update BQL sent bytes before ringing doorbell
        ...
      ce55c22e
    • Linus Torvalds's avatar
      Merge tag 'loongarch-fixes-6.6-3' of... · 74e9347e
      Linus Torvalds authored
      Merge tag 'loongarch-fixes-6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
      
      Pull LoongArch fixes from Huacai ChenL
       "Fix 4-level pagetable building, disable WUC for pgprot_writecombine()
        like ioremap_wc(), use correct annotation for exception handlers, and
        a trivial cleanup"
      
      * tag 'loongarch-fixes-6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
        LoongArch: Disable WUC for pgprot_writecombine() like ioremap_wc()
        LoongArch: Replace kmap_atomic() with kmap_local_page() in copy_user_highpage()
        LoongArch: Export symbol invalid_pud_table for modules building
        LoongArch: Use SYM_CODE_* to annotate exception handlers
      74e9347e
    • Linus Torvalds's avatar
      Merge tag 'slab-fixes-for-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab · 54fb58ae
      Linus Torvalds authored
      Pull slab fix from Vlastimil Babka:
      
       - stable fix to prevent kernel warnings with KASAN_HW_TAGS on arm64
         due to improperly resolved kmalloc alignment restrictions (Catalin
         Marinas)
      
      * tag 'slab-fixes-for-6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
        mm: slab: Do not create kmalloc caches smaller than arch_slab_minalign()
      54fb58ae
    • Linus Torvalds's avatar
      Merge tag 'seccomp-v6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 189b7562
      Linus Torvalds authored
      Pull seccomp fix from Kees Cook:
      
       - Fix seccomp_unotify perf benchmark for 32-bit (Jiri Slaby)
      
      * tag 'seccomp-v6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        perf/benchmark: fix seccomp_unotify benchmark for 32-bit
      189b7562
    • Linus Torvalds's avatar
      Merge tag 'v6.6-rc7.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · ea1cc20c
      Linus Torvalds authored
      Pull vfs fix from Christian Brauner:
       "An openat() call from io_uring triggering an audit call can apparently
        cause the refcount of struct filename to be incremented from multiple
        threads concurrently during async execution, triggering a refcount
        underflow and hitting a BUG_ON(). That bug has been lurking around
        since at least v5.16 apparently.
      
        Switch to an atomic counter to fix that. The underflow check is
        downgraded from a BUG_ON() to a WARN_ON_ONCE() but we could easily
        remove that check altogether tbh"
      
      * tag 'v6.6-rc7.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        audit,io_uring: io_uring openat triggers audit reference count underflow
      ea1cc20c
    • Kory Maincent's avatar
      Revert "ethtool: Fix mod state of verbose no_mask bitset" · 52451502
      Kory Maincent authored
      This reverts commit 108a36d0.
      
      It was reported that this fix breaks the possibility to remove existing WoL
      flags. For example:
      ~$ ethtool lan2
      ...
              Supports Wake-on: pg
              Wake-on: d
      ...
      ~$ ethtool -s lan2 wol gp
      ~$ ethtool lan2
      ...
              Wake-on: pg
      ...
      ~$ ethtool -s lan2 wol d
      ~$ ethtool lan2
      ...
              Wake-on: pg
      ...
      
      This worked correctly before this commit because we were always updating
      a zero bitmap (since commit 66991703 ("ethtool: fix application of
      verbose no_mask bitset"), that is) so that the rest was left zero
      naturally. But now the 1->0 change (old_val is true, bit not present in
      netlink nest) no longer works.
      Reported-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Reported-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Closes: https://lore.kernel.org/netdev/20231019095140.l6fffnszraeb6iiw@lion.mk-sys.cz/
      Cc: stable@vger.kernel.org
      Fixes: 108a36d0 ("ethtool: Fix mod state of verbose no_mask bitset")
      Signed-off-by: default avatarKory Maincent <kory.maincent@bootlin.com>
      Reviewed-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Link: https://lore.kernel.org/r/20231019-feature_ptp_bitset_fix-v1-1-70f3c429a221@bootlin.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      52451502
    • Linus Torvalds's avatar
      Merge tag 'ntfs3_for_6.6' of https://github.com/Paragon-Software-Group/linux-ntfs3 · f69d00d1
      Linus Torvalds authored
      Pull ntfs3 fixes from Konstantin Komarov:
      
       - memory leak
      
       - some logic errors, NULL dereferences
      
       - some code was refactored
      
       - more sanity checks
      
      * tag 'ntfs3_for_6.6' of https://github.com/Paragon-Software-Group/linux-ntfs3:
        fs/ntfs3: Avoid possible memory leak
        fs/ntfs3: Fix directory element type detection
        fs/ntfs3: Fix possible null-pointer dereference in hdr_find_e()
        fs/ntfs3: Fix OOB read in ntfs_init_from_boot
        fs/ntfs3: fix panic about slab-out-of-bounds caused by ntfs_list_ea()
        fs/ntfs3: Fix NULL pointer dereference on error in attr_allocate_frame()
        fs/ntfs3: Fix possible NULL-ptr-deref in ni_readpage_cmpr()
        fs/ntfs3: Do not allow to change label if volume is read-only
        fs/ntfs3: Add more info into /proc/fs/ntfs3/<dev>/volinfo
        fs/ntfs3: Refactoring and comments
        fs/ntfs3: Fix alternative boot searching
        fs/ntfs3: Allow repeated call to ntfs3_put_sbi
        fs/ntfs3: Use inode_set_ctime_to_ts instead of inode_set_ctime
        fs/ntfs3: Fix shift-out-of-bounds in ntfs_fill_super
        fs/ntfs3: fix deadlock in mark_as_free_ex
        fs/ntfs3: Add more attributes checks in mi_enum_attr()
        fs/ntfs3: Use kvmalloc instead of kmalloc(... __GFP_NOWARN)
        fs/ntfs3: Write immediately updated ntfs state
        fs/ntfs3: Add ckeck in ni_update_parent()
      f69d00d1
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-fixes-for-v6-6' · 1c1f14f9
      Jakub Kicinski authored
      Mat Martineau says:
      
      ====================
      mptcp: Fixes for v6.6
      
      Patch 1 corrects the logic for MP_JOIN tests where 0 RSTs are expected.
      
      Patch 2 ensures MPTCP packets are not incorrectly coalesced in the TCP
      backlog queue.
      
      Patch 3 avoids a zero-window probe and associated WARN_ON_ONCE() in an
      expected MPTCP reinjection scenario.
      
      Patches 4 & 5 allow an initial MPTCP subflow to be closed cleanly
      instead of always sending RST. Associated selftest is updated.
      ====================
      
      Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-0-17ecb002e41d@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1c1f14f9
    • Matthieu Baerts's avatar
      selftests: mptcp: join: no RST when rm subflow/addr · 2cfaa8b3
      Matthieu Baerts authored
      Recently, we noticed that some RST were wrongly generated when removing
      the initial subflow.
      
      This patch makes sure RST are not sent when removing any subflows or any
      addresses.
      
      Fixes: c2b2ae39 ("mptcp: handle correctly disconnect() failures")
      Cc: stable@vger.kernel.org
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMatthieu Baerts <matttbe@kernel.org>
      Signed-off-by: default avatarMat Martineau <martineau@kernel.org>
      Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-5-17ecb002e41d@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2cfaa8b3
    • Geliang Tang's avatar
      mptcp: avoid sending RST when closing the initial subflow · 14c56686
      Geliang Tang authored
      When closing the first subflow, the MPTCP protocol unconditionally
      calls tcp_disconnect(), which in turn generates a reset if the subflow
      is established.
      
      That is unexpected and different from what MPTCP does with MPJ
      subflows, where resets are generated only on FASTCLOSE and other edge
      scenarios.
      
      We can't reuse for the first subflow the same code in place for MPJ
      subflows, as MPTCP clean them up completely via a tcp_close() call,
      while must keep the first subflow socket alive for later re-usage, due
      to implementation constraints.
      
      This patch adds a new helper __mptcp_subflow_disconnect() that
      encapsulates, a logic similar to tcp_close, issuing a reset only when
      the MPTCP_CF_FASTCLOSE flag is set, and performing a clean shutdown
      otherwise.
      
      Fixes: c2b2ae39 ("mptcp: handle correctly disconnect() failures")
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarMatthieu Baerts <matttbe@kernel.org>
      Co-developed-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarGeliang Tang <geliang.tang@suse.com>
      Signed-off-by: default avatarMat Martineau <martineau@kernel.org>
      Link: https://lore.kernel.org/r/20231018-send-net-20231018-v1-4-17ecb002e41d@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      14c56686