1. 01 Feb, 2024 23 commits
    • Benjamin Poirier's avatar
      selftests: net: Remove executable bits from library scripts · 9d851dd4
      Benjamin Poirier authored
      setup_loopback.sh and net_helper.sh are meant to be sourced from other
      scripts, not executed directly. Therefore, remove the executable bits from
      those files' permissions.
      
      This change is similar to commit 49078c1b ("selftests: forwarding:
      Remove executable bits from lib.sh")
      
      Fixes: 7d157501 ("selftests/net: GRO coalesce test")
      Fixes: 3bdd9fd2 ("selftests/net: synchronize udpgro tests' tx and rx connection")
      Suggested-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarBenjamin Poirier <bpoirier@nvidia.com>
      Link: https://lore.kernel.org/r/20240131140848.360618-4-bpoirier@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9d851dd4
    • Benjamin Poirier's avatar
      selftests: bonding: Check initial state · 8cc063ae
      Benjamin Poirier authored
      The purpose of the test_LAG_cleanup() function is to check that some
      hardware addresses are removed from underlying devices after they have been
      unenslaved. The test function simply checks that those addresses are not
      present at the end. However, if the addresses were never added to begin
      with due to some error in device setup, the test function currently passes.
      This is a false positive since in that situation the test did not actually
      exercise the intended functionality.
      
      Add a check that the expected addresses are indeed present after device
      setup. This makes the test function more robust.
      
      I noticed this problem when running the team/dev_addr_lists.sh test on a
      system without support for dummy and ipv6:
      
      tools/testing/selftests/drivers/net/team# ./dev_addr_lists.sh
      Error: Unknown device type.
      Error: Unknown device type.
      This program is not intended to be run as root.
      RTNETLINK answers: Operation not supported
      TEST: team cleanup mode lacp                                        [ OK ]
      
      Fixes: bbb774d9 ("net: Add tests for bonding and team address list management")
      Signed-off-by: default avatarBenjamin Poirier <bpoirier@nvidia.com>
      Link: https://lore.kernel.org/r/20240131140848.360618-3-bpoirier@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8cc063ae
    • Benjamin Poirier's avatar
      selftests: team: Add missing config options · 7b6fb305
      Benjamin Poirier authored
      Similar to commit dd2d40ac ("selftests: bonding: Add more missing
      config options"), add more networking-specific config options which are
      needed for team device tests.
      
      For testing, I used the minimal config generated by virtme-ng and I added
      the options in the config file. Afterwards, the team device test passed.
      
      Fixes: bbb774d9 ("net: Add tests for bonding and team address list management")
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarBenjamin Poirier <bpoirier@nvidia.com>
      Link: https://lore.kernel.org/r/20240131140848.360618-2-bpoirier@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7b6fb305
    • Souradeep Chakrabarti's avatar
      hv_netvsc: Fix race condition between netvsc_probe and netvsc_remove · e0526ec5
      Souradeep Chakrabarti authored
      In commit ac504767 ("hv_netvsc: Disable NAPI before closing the
      VMBus channel"), napi_disable was getting called for all channels,
      including all subchannels without confirming if they are enabled or not.
      
      This caused hv_netvsc getting hung at napi_disable, when netvsc_probe()
      has finished running but nvdev->subchan_work has not started yet.
      netvsc_subchan_work() -> rndis_set_subchannel() has not created the
      sub-channels and because of that netvsc_sc_open() is not running.
      netvsc_remove() calls cancel_work_sync(&nvdev->subchan_work), for which
      netvsc_subchan_work did not run.
      
      netif_napi_add() sets the bit NAPI_STATE_SCHED because it ensures NAPI
      cannot be scheduled. Then netvsc_sc_open() -> napi_enable will clear the
      NAPIF_STATE_SCHED bit, so it can be scheduled. napi_disable() does the
      opposite.
      
      Now during netvsc_device_remove(), when napi_disable is called for those
      subchannels, napi_disable gets stuck on infinite msleep.
      
      This fix addresses this problem by ensuring that napi_disable() is not
      getting called for non-enabled NAPI struct.
      But netif_napi_del() is still necessary for these non-enabled NAPI struct
      for cleanup purpose.
      
      Call trace:
      [  654.559417] task:modprobe        state:D stack:    0 pid: 2321 ppid:  1091 flags:0x00004002
      [  654.568030] Call Trace:
      [  654.571221]  <TASK>
      [  654.573790]  __schedule+0x2d6/0x960
      [  654.577733]  schedule+0x69/0xf0
      [  654.581214]  schedule_timeout+0x87/0x140
      [  654.585463]  ? __bpf_trace_tick_stop+0x20/0x20
      [  654.590291]  msleep+0x2d/0x40
      [  654.593625]  napi_disable+0x2b/0x80
      [  654.597437]  netvsc_device_remove+0x8a/0x1f0 [hv_netvsc]
      [  654.603935]  rndis_filter_device_remove+0x194/0x1c0 [hv_netvsc]
      [  654.611101]  ? do_wait_intr+0xb0/0xb0
      [  654.615753]  netvsc_remove+0x7c/0x120 [hv_netvsc]
      [  654.621675]  vmbus_remove+0x27/0x40 [hv_vmbus]
      
      Cc: stable@vger.kernel.org
      Fixes: ac504767 ("hv_netvsc: Disable NAPI before closing the VMBus channel")
      Signed-off-by: default avatarSouradeep Chakrabarti <schakrabarti@linux.microsoft.com>
      Reviewed-by: default avatarDexuan Cui <decui@microsoft.com>
      Reviewed-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/1706686551-28510-1-git-send-email-schakrabarti@linux.microsoft.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e0526ec5
    • Jan Beulich's avatar
      xen-netback: properly sync TX responses · 7b55984c
      Jan Beulich authored
      Invoking the make_tx_response() / push_tx_responses() pair with no lock
      held would be acceptable only if all such invocations happened from the
      same context (NAPI instance or dealloc thread). Since this isn't the
      case, and since the interface "spec" also doesn't demand that multicast
      operations may only be performed with no in-flight transmits,
      MCAST_{ADD,DEL} processing also needs to acquire the response lock
      around the invocations.
      
      To prevent similar mistakes going forward, "downgrade" the present
      functions to private helpers of just the two remaining ones using them
      directly, with no forward declarations anymore. This involves renaming
      what so far was make_tx_response(), for the new function of that name
      to serve the new (wrapper) purpose.
      
      While there,
      - constify the txp parameters,
      - correct xenvif_idx_release()'s status parameter's type,
      - rename {,_}make_tx_response()'s status parameters for consistency with
        xenvif_idx_release()'s.
      
      Fixes: 210c34dc ("xen-netback: add support for multicast control")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Reviewed-by: default avatarPaul Durrant <paul@xen.org>
      Link: https://lore.kernel.org/r/980c6c3d-e10e-4459-8565-e8fbde122f00@suse.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7b55984c
    • Breno Leitao's avatar
      net: sysfs: Fix /sys/class/net/<iface> path · ae3f4b44
      Breno Leitao authored
      The documentation is pointing to the wrong path for the interface.
      Documentation is pointing to /sys/class/<iface>, instead of
      /sys/class/net/<iface>.
      
      Fix it by adding the `net/` directory before the interface.
      
      Fixes: 1a02ef76 ("net: sysfs: add documentation entries for /sys/class/<iface>/queues")
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Link: https://lore.kernel.org/r/20240131102150.728960-2-leitao@debian.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ae3f4b44
    • Geetha sowjanya's avatar
      octeontx2-pf: Remove xdp queues on program detach · 04f647c8
      Geetha sowjanya authored
      XDP queues are created/destroyed when a XDP program
      is attached/detached. In current driver xdp_queues are not
      getting destroyed on program exit due to incorrect xdp_queue
      and tot_tx_queue count values.
      
      This patch fixes the issue by setting tot_tx_queue and xdp_queue
      count to correct values. It also fixes xdp.data_hard_start address.
      
      Fixes: 06059a1a ("octeontx2-pf: Add XDP support to netdev PF")
      Signed-off-by: default avatarGeetha sowjanya <gakula@marvell.com>
      Link: https://lore.kernel.org/r/20240130120610.16673-1-gakula@marvell.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      04f647c8
    • Jakub Kicinski's avatar
      Merge branch 'selftests-net-a-few-pmtu-sh-fixes' · b59af304
      Jakub Kicinski authored
      Paolo Abeni says:
      
      ====================
      selftests: net: a few pmtu.sh fixes
      
      This series try to address CI failures for the pmtu.sh tests. It
      does _not_ attempt to enable all the currently skipped cases, to
      avoid adding more entropy.
      
      Tested with:
      
      make -C tools/testing/selftests/ TARGETS=net install
      vng --build  --config tools/testing/selftests/net/config
      vng --run . --user root -- \
      	./tools/testing/selftests/kselftest_install/run_kselftest.sh \
      	-t net:pmtu.sh
      ====================
      
      Link: https://lore.kernel.org/r/cover.1706635101.git.pabeni@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b59af304
    • Paolo Abeni's avatar
      selftests: net: don't access /dev/stdout in pmtu.sh · bc0970d5
      Paolo Abeni authored
      When running the pmtu.sh via the kselftest infra, accessing
      /dev/stdout gives unexpected results:
        # dd: failed to open '/dev/stdout': Device or resource busy
        # TEST: IPv4, bridged vxlan4: PMTU exceptions                         [FAIL]
      
      Let dd use directly the standard output to fix the above:
        # TEST: IPv4, bridged vxlan4: PMTU exceptions - nexthop objects       [ OK ]
      
      Fixes: 136a1b43 ("selftests: net: test vxlan pmtu exceptions with tcp")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarGuillaume Nault <gnault@redhat.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/23d7592c5d77d75cff9b34f15c227f92e911c2ae.1706635101.git.pabeni@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bc0970d5
    • Paolo Abeni's avatar
      selftests: net: fix available tunnels detection · e4e4b6d5
      Paolo Abeni authored
      The pmtu.sh test tries to detect the tunnel protocols available
      in the running kernel and properly skip the unsupported cases.
      
      In a few more complex setup, such detection is unsuccessful, as
      the script currently ignores some intermediate error code at
      setup time.
      
      Before:
        # which: no nettest in (/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin)
        # TEST: vti6: PMTU exceptions (ESP-in-UDP)                            [FAIL]
        #   PMTU exception wasn't created after creating tunnel exceeding link layer MTU
        # ./pmtu.sh: line 931: kill: (7543) - No such process
        # ./pmtu.sh: line 931: kill: (7544) - No such process
      
      After:
        #   xfrm4 not supported
        # TEST: vti4: PMTU exceptions                                         [SKIP]
      
      Fixes: ece1278a ("selftests: net: add ESP-in-UDP PMTU test")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarGuillaume Nault <gnault@redhat.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/cab10e75fda618e6fff8c595b632f47db58b9309.1706635101.git.pabeni@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e4e4b6d5
    • Paolo Abeni's avatar
      selftests: net: add missing config for pmtu.sh tests · f7c25d8e
      Paolo Abeni authored
      The mentioned test uses a few Kconfig still missing the
      net config, add them.
      
      Before:
        # Error: Specified qdisc kind is unknown.
        # Error: Specified qdisc kind is unknown.
        # Error: Qdisc not classful.
        # We have an error talking to the kernel
        # Error: Qdisc not classful.
        # We have an error talking to the kernel
        #   policy_routing not supported
        # TEST: ICMPv4 with DSCP and ECN: PMTU exceptions                     [SKIP]
      
      After:
        # TEST: ICMPv4 with DSCP and ECN: PMTU exceptions                     [ OK ]
      
      Fixes: ec730c3e ("selftest: net: Test IPv4 PMTU exceptions with DSCP and ECN")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarGuillaume Nault <gnault@redhat.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/8d27bf6762a5c7b3acc457d6e6872c533040f9c1.1706635101.git.pabeni@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f7c25d8e
    • Jakub Kicinski's avatar
      Merge branch 'pds_core-various-fixes' · ebe31214
      Jakub Kicinski authored
      Brett Creeley says:
      
      ====================
      pds_core: Various fixes
      
      This series includes the following changes:
      
      There can be many users of the pds_core's adminq. This includes
      pds_core's uses and any clients that depend on it. When the pds_core
      device goes through a reset for any reason the adminq is freed
      and reconfigured. There are some gaps in the current implementation
      that will cause crashes during reset if any of the previously mentioned
      users of the adminq attempt to use it after it's been freed.
      
      Issues around how resets are handled, specifically regarding the driver's
      error handlers.
      
      Originally these patches were aimed at net-next, but it was requested to
      push the fixes patches to net. The original patches can be found here:
      
      https://lore.kernel.org/netdev/20240126174255.17052-1-brett.creeley@amd.com/
      
      Also, the Reviewed-by tags were left in place from net-next reviews as the
      patches didn't change.
      ====================
      
      Link: https://lore.kernel.org/r/20240129234035.69802-1-brett.creeley@amd.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ebe31214
    • Brett Creeley's avatar
      pds_core: Rework teardown/setup flow to be more common · bc90fbe0
      Brett Creeley authored
      Currently the teardown/setup flow for driver probe/remove is quite
      a bit different from the reset flows in pdsc_fw_down()/pdsc_fw_up().
      One key piece that's missing are the calls to pci_alloc_irq_vectors()
      and pci_free_irq_vectors(). The pcie reset case is calling
      pci_free_irq_vectors() on reset_prepare, but not calling the
      corresponding pci_alloc_irq_vectors() on reset_done. This is causing
      unexpected/unwanted interrupt behavior due to the adminq interrupt
      being accidentally put into legacy interrupt mode. Also, the
      pci_alloc_irq_vectors()/pci_free_irq_vectors() functions are being
      called directly in probe/remove respectively.
      
      Fix this inconsistency by making the following changes:
        1. Always call pdsc_dev_init() in pdsc_setup(), which calls
           pci_alloc_irq_vectors() and get rid of the now unused
           pds_dev_reinit().
        2. Always free/clear the pdsc->intr_info in pdsc_teardown()
           since this structure will get re-alloced in pdsc_setup().
        3. Move the calls of pci_free_irq_vectors() to pdsc_teardown()
           since pci_alloc_irq_vectors() will always be called in
           pdsc_setup()->pdsc_dev_init() for both the probe/remove and
           reset flows.
        4. Make sure to only create the debugfs "identity" entry when it
           doesn't already exist, which it will in the reset case because
           it's already been created in the initial call to pdsc_dev_init().
      
      Fixes: ffa55858 ("pds_core: implement pci reset handlers")
      Signed-off-by: default avatarBrett Creeley <brett.creeley@amd.com>
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Link: https://lore.kernel.org/r/20240129234035.69802-7-brett.creeley@amd.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bc90fbe0
    • Brett Creeley's avatar
      pds_core: Clear BARs on reset · e96094c1
      Brett Creeley authored
      During reset the BARs might be accessed when they are
      unmapped. This can cause unexpected issues, so fix it by
      clearing the cached BAR values so they are not accessed
      until they are re-mapped.
      
      Also, make sure any places that can access the BARs
      when they are NULL are prevented.
      
      Fixes: 49ce92fb ("pds_core: add FW update feature to devlink")
      Signed-off-by: default avatarBrett Creeley <brett.creeley@amd.com>
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Link: https://lore.kernel.org/r/20240129234035.69802-6-brett.creeley@amd.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e96094c1
    • Brett Creeley's avatar
      pds_core: Prevent race issues involving the adminq · 7e82a874
      Brett Creeley authored
      There are multiple paths that can result in using the pdsc's
      adminq.
      
      [1] pdsc_adminq_isr and the resulting work from queue_work(),
          i.e. pdsc_work_thread()->pdsc_process_adminq()
      
      [2] pdsc_adminq_post()
      
      When the device goes through reset via PCIe reset and/or
      a fw_down/fw_up cycle due to bad PCIe state or bad device
      state the adminq is destroyed and recreated.
      
      A NULL pointer dereference can happen if [1] or [2] happens
      after the adminq is already destroyed.
      
      In order to fix this, add some further state checks and
      implement reference counting for adminq uses. Reference
      counting was used because multiple threads can attempt to
      access the adminq at the same time via [1] or [2]. Additionally,
      multiple clients (i.e. pds-vfio-pci) can be using [2]
      at the same time.
      
      The adminq_refcnt is initialized to 1 when the adminq has been
      allocated and is ready to use. Users/clients of the adminq
      (i.e. [1] and [2]) will increment the refcnt when they are using
      the adminq. When the driver goes into a fw_down cycle it will
      set the PDSC_S_FW_DEAD bit and then wait for the adminq_refcnt
      to hit 1. Setting the PDSC_S_FW_DEAD before waiting will prevent
      any further adminq_refcnt increments. Waiting for the
      adminq_refcnt to hit 1 allows for any current users of the adminq
      to finish before the driver frees the adminq. Once the
      adminq_refcnt hits 1 the driver clears the refcnt to signify that
      the adminq is deleted and cannot be used. On the fw_up cycle the
      driver will once again initialize the adminq_refcnt to 1 allowing
      the adminq to be used again.
      
      Fixes: 01ba61b5 ("pds_core: Add adminq processing and commands")
      Signed-off-by: default avatarBrett Creeley <brett.creeley@amd.com>
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Link: https://lore.kernel.org/r/20240129234035.69802-5-brett.creeley@amd.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7e82a874
    • Brett Creeley's avatar
      pds_core: Use struct pdsc for the pdsc_adminq_isr private data · 95170515
      Brett Creeley authored
      The initial design for the adminq interrupt was done based
      on client drivers having their own adminq and adminq
      interrupt. So, each client driver's adminq isr would use
      their specific adminqcq for the private data struct. For the
      time being the design has changed to only use a single
      adminq for all clients. So, instead use the struct pdsc for
      the private data to simplify things a bit.
      
      This also has the benefit of not dereferencing the adminqcq
      to access the pdsc struct when the PDSC_S_STOPPING_DRIVER bit
      is set and the adminqcq has actually been cleared/freed.
      
      Fixes: 01ba61b5 ("pds_core: Add adminq processing and commands")
      Signed-off-by: default avatarBrett Creeley <brett.creeley@amd.com>
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Link: https://lore.kernel.org/r/20240129234035.69802-4-brett.creeley@amd.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      95170515
    • Brett Creeley's avatar
      pds_core: Cancel AQ work on teardown · d321067e
      Brett Creeley authored
      There is a small window where pdsc_work_thread()
      calls pdsc_process_adminq() and pdsc_process_adminq()
      passes the PDSC_S_STOPPING_DRIVER check and starts
      to process adminq/notifyq work and then the driver
      starts a fw_down cycle. This could cause some
      undefined behavior if the notifyqcq/adminqcq are
      free'd while pdsc_process_adminq() is running. Use
      cancel_work_sync() on the adminqcq's work struct
      to make sure any pending work items are cancelled
      and any in progress work items are completed.
      
      Also, make sure to not call cancel_work_sync() if
      the work item has not be initialized. Without this,
      traces will happen in cases where a reset fails and
      teardown is called again or if reset fails and the
      driver is removed.
      
      Fixes: 01ba61b5 ("pds_core: Add adminq processing and commands")
      Signed-off-by: default avatarBrett Creeley <brett.creeley@amd.com>
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Link: https://lore.kernel.org/r/20240129234035.69802-3-brett.creeley@amd.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d321067e
    • Brett Creeley's avatar
      pds_core: Prevent health thread from running during reset/remove · d9407ff1
      Brett Creeley authored
      The PCIe reset handlers can run at the same time as the
      health thread. This can cause the health thread to
      stomp on the PCIe reset. Fix this by preventing the
      health thread from running while a PCIe reset is happening.
      
      As part of this use timer_shutdown_sync() during reset and
      remove to make sure the timer doesn't ever get rearmed.
      
      Fixes: ffa55858 ("pds_core: implement pci reset handlers")
      Signed-off-by: default avatarBrett Creeley <brett.creeley@amd.com>
      Reviewed-by: default avatarShannon Nelson <shannon.nelson@amd.com>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Link: https://lore.kernel.org/r/20240129234035.69802-2-brett.creeley@amd.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d9407ff1
    • Eric Dumazet's avatar
      af_unix: fix lockdep positive in sk_diag_dump_icons() · 4d322dce
      Eric Dumazet authored
      syzbot reported a lockdep splat [1].
      
      Blamed commit hinted about the possible lockdep
      violation, and code used unix_state_lock_nested()
      in an attempt to silence lockdep.
      
      It is not sufficient, because unix_state_lock_nested()
      is already used from unix_state_double_lock().
      
      We need to use a separate subclass.
      
      This patch adds a distinct enumeration to make things
      more explicit.
      
      Also use swap() in unix_state_double_lock() as a clean up.
      
      v2: add a missing inline keyword to unix_state_lock_nested()
      
      [1]
      WARNING: possible circular locking dependency detected
      6.8.0-rc1-syzkaller-00356-g8a696a29 #0 Not tainted
      
      syz-executor.1/2542 is trying to acquire lock:
       ffff88808b5df9e8 (rlock-AF_UNIX){+.+.}-{2:2}, at: skb_queue_tail+0x36/0x120 net/core/skbuff.c:3863
      
      but task is already holding lock:
       ffff88808b5dfe70 (&u->lock/1){+.+.}-{2:2}, at: unix_dgram_sendmsg+0xfc7/0x2200 net/unix/af_unix.c:2089
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (&u->lock/1){+.+.}-{2:2}:
              lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
              _raw_spin_lock_nested+0x31/0x40 kernel/locking/spinlock.c:378
              sk_diag_dump_icons net/unix/diag.c:87 [inline]
              sk_diag_fill+0x6ea/0xfe0 net/unix/diag.c:157
              sk_diag_dump net/unix/diag.c:196 [inline]
              unix_diag_dump+0x3e9/0x630 net/unix/diag.c:220
              netlink_dump+0x5c1/0xcd0 net/netlink/af_netlink.c:2264
              __netlink_dump_start+0x5d7/0x780 net/netlink/af_netlink.c:2370
              netlink_dump_start include/linux/netlink.h:338 [inline]
              unix_diag_handler_dump+0x1c3/0x8f0 net/unix/diag.c:319
             sock_diag_rcv_msg+0xe3/0x400
              netlink_rcv_skb+0x1df/0x430 net/netlink/af_netlink.c:2543
              sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:280
              netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline]
              netlink_unicast+0x7e6/0x980 net/netlink/af_netlink.c:1367
              netlink_sendmsg+0xa37/0xd70 net/netlink/af_netlink.c:1908
              sock_sendmsg_nosec net/socket.c:730 [inline]
              __sock_sendmsg net/socket.c:745 [inline]
              sock_write_iter+0x39a/0x520 net/socket.c:1160
              call_write_iter include/linux/fs.h:2085 [inline]
              new_sync_write fs/read_write.c:497 [inline]
              vfs_write+0xa74/0xca0 fs/read_write.c:590
              ksys_write+0x1a0/0x2c0 fs/read_write.c:643
              do_syscall_x64 arch/x86/entry/common.c:52 [inline]
              do_syscall_64+0xf5/0x230 arch/x86/entry/common.c:83
             entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      -> #0 (rlock-AF_UNIX){+.+.}-{2:2}:
              check_prev_add kernel/locking/lockdep.c:3134 [inline]
              check_prevs_add kernel/locking/lockdep.c:3253 [inline]
              validate_chain+0x1909/0x5ab0 kernel/locking/lockdep.c:3869
              __lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137
              lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
              __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
              _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
              skb_queue_tail+0x36/0x120 net/core/skbuff.c:3863
              unix_dgram_sendmsg+0x15d9/0x2200 net/unix/af_unix.c:2112
              sock_sendmsg_nosec net/socket.c:730 [inline]
              __sock_sendmsg net/socket.c:745 [inline]
              ____sys_sendmsg+0x592/0x890 net/socket.c:2584
              ___sys_sendmsg net/socket.c:2638 [inline]
              __sys_sendmmsg+0x3b2/0x730 net/socket.c:2724
              __do_sys_sendmmsg net/socket.c:2753 [inline]
              __se_sys_sendmmsg net/socket.c:2750 [inline]
              __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2750
              do_syscall_x64 arch/x86/entry/common.c:52 [inline]
              do_syscall_64+0xf5/0x230 arch/x86/entry/common.c:83
             entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      other info that might help us debug this:
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&u->lock/1);
                                     lock(rlock-AF_UNIX);
                                     lock(&u->lock/1);
        lock(rlock-AF_UNIX);
      
       *** DEADLOCK ***
      
      1 lock held by syz-executor.1/2542:
        #0: ffff88808b5dfe70 (&u->lock/1){+.+.}-{2:2}, at: unix_dgram_sendmsg+0xfc7/0x2200 net/unix/af_unix.c:2089
      
      stack backtrace:
      CPU: 1 PID: 2542 Comm: syz-executor.1 Not tainted 6.8.0-rc1-syzkaller-00356-g8a696a29 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
      Call Trace:
       <TASK>
        __dump_stack lib/dump_stack.c:88 [inline]
        dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
        check_noncircular+0x366/0x490 kernel/locking/lockdep.c:2187
        check_prev_add kernel/locking/lockdep.c:3134 [inline]
        check_prevs_add kernel/locking/lockdep.c:3253 [inline]
        validate_chain+0x1909/0x5ab0 kernel/locking/lockdep.c:3869
        __lock_acquire+0x1345/0x1fd0 kernel/locking/lockdep.c:5137
        lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
        __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
        _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
        skb_queue_tail+0x36/0x120 net/core/skbuff.c:3863
        unix_dgram_sendmsg+0x15d9/0x2200 net/unix/af_unix.c:2112
        sock_sendmsg_nosec net/socket.c:730 [inline]
        __sock_sendmsg net/socket.c:745 [inline]
        ____sys_sendmsg+0x592/0x890 net/socket.c:2584
        ___sys_sendmsg net/socket.c:2638 [inline]
        __sys_sendmmsg+0x3b2/0x730 net/socket.c:2724
        __do_sys_sendmmsg net/socket.c:2753 [inline]
        __se_sys_sendmmsg net/socket.c:2750 [inline]
        __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2750
        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
        do_syscall_64+0xf5/0x230 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x63/0x6b
      RIP: 0033:0x7f26d887cda9
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f26d95a60c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
      RAX: ffffffffffffffda RBX: 00007f26d89abf80 RCX: 00007f26d887cda9
      RDX: 000000000000003e RSI: 00000000200bd000 RDI: 0000000000000004
      RBP: 00007f26d88c947a R08: 0000000000000000 R09: 0000000000000000
      R10: 00000000000008c0 R11: 0000000000000246 R12: 0000000000000000
      R13: 000000000000000b R14: 00007f26d89abf80 R15: 00007ffcfe081a68
      
      Fixes: 2aac7a2c ("unix_diag: Pending connections IDs NLA")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20240130184235.1620738-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4d322dce
    • Jakub Kicinski's avatar
      Merge branch 'selftests-net-a-couple-of-typos-fixes-in-key-management-rst-tests' · 15b87e26
      Jakub Kicinski authored
      Dmitry Safonov says:
      
      ====================
      selftests/net: A couple of typos fixes in key-management/rst tests
      
      Two typo fixes, noticed by Mohammad's review.
      And a fix for an issue that got uncovered.
      
      v1: https://lore.kernel.org/r/20240118-tcp-ao-test-key-mgmt-v1-0-3583ca147113@arista.comSigned-off-by: default avatarDmitry Safonov <dima@arista.com>
      ====================
      
      Link: https://lore.kernel.org/r/20240130-tcp-ao-test-key-mgmt-v2-0-d190430a6c60@arista.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      15b87e26
    • Dmitry Safonov's avatar
      selftests/net: Repair RST passive reset selftest · 6caf3adc
      Dmitry Safonov authored
      Currently, the test is racy and seems to not pass anymore.
      
      In order to rectify it, aim on TCP_TW_RST.
      Doesn't seem way too good with this sleep() part, but it seems as
      a reasonable compromise for the test. There is a plan in-line comment on
      how-to improve it, going to do it on the top, at this moment I want it
      to run on netdev/patchwork selftests dashboard.
      
      It also slightly changes tcp_ao-lib in order to get SO_ERROR propagated
      to test_client_verify() return value.
      
      Fixes: c6df7b23 ("selftests/net: Add TCP-AO RST test")
      Signed-off-by: default avatarDmitry Safonov <dima@arista.com>
      Link: https://lore.kernel.org/r/20240130-tcp-ao-test-key-mgmt-v2-3-d190430a6c60@arista.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6caf3adc
    • Dmitry Safonov's avatar
      selftests/net: Rectify key counters checks · 384aa16d
      Dmitry Safonov authored
      As the names of (struct test_key) members didn't reflect whether the key
      was used for TX or RX, the verification for the counters was done
      incorrectly for asymmetrical selftests.
      
      Rename these with _tx appendix and fix checks in verify_counters().
      While at it, as the checks are now correct, introduce skip_counters_checks,
      which is intended for tests where it's expected that a key that was set
      with setsockopt(sk, IPPROTO_TCP, TCP_AO_INFO, ...) might had no chance
      of getting used on the wire.
      
      Fixes the following failures, exposed by the previous commit:
      > not ok 51 server: Check current != rnext keys set before connect(): Counter pkt_good was expected to increase 0 => 0 for key 132:5
      > not ok 52 server: Check current != rnext keys set before connect(): Counter pkt_good was not expected to increase 0 => 21 for key 137:10
      >
      > not ok 63 server: Check current flapping back on peer's RnextKey request: Counter pkt_good was expected to increase 0 => 0 for key 132:5
      > not ok 64 server: Check current flapping back on peer's RnextKey request: Counter pkt_good was not expected to increase 0 => 40 for key 137:10
      
      Cc: Mohammad Nassiri <mnassiri@ciena.com>
      Fixes: 3c3ead55 ("selftests/net: Add TCP-AO key-management test")
      Signed-off-by: default avatarDmitry Safonov <dima@arista.com>
      Link: https://lore.kernel.org/r/20240130-tcp-ao-test-key-mgmt-v2-2-d190430a6c60@arista.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      384aa16d
    • Mohammad Nassiri's avatar
      selftests/net: Argument value mismatch when calling verify_counters() · d8f5df1f
      Mohammad Nassiri authored
      The end_server() function only operates in the server thread
      and always takes an accept socket instead of a listen socket as
      its input argument. To align with this, invert the boolean values
      used when calling verify_counters() within the end_server() function.
      
      As a result of this typo, the test didn't correctly check for
      the non-symmetrical scenario, where i.e. peer-A uses a key <100:200>
      to send data, but peer-B uses another key <105:205> to send its data.
      So, in simple words, different keys for TX and RX.
      
      Fixes: 3c3ead55 ("selftests/net: Add TCP-AO key-management test")
      Signed-off-by: default avatarMohammad Nassiri <mnassiri@ciena.com>
      Link: https://lore.kernel.org/all/934627c5-eebb-4626-be23-cfb134c01d1a@arista.com/
      [amended 'Fixes' tag, added the issue description and carried-over to lkml]
      Signed-off-by: default avatarDmitry Safonov <dima@arista.com>
      Link: https://lore.kernel.org/r/20240130-tcp-ao-test-key-mgmt-v2-1-d190430a6c60@arista.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d8f5df1f
  2. 31 Jan, 2024 8 commits
  3. 30 Jan, 2024 7 commits
    • Eric Dumazet's avatar
      llc: call sock_orphan() at release time · aa2b2eb3
      Eric Dumazet authored
      syzbot reported an interesting trace [1] caused by a stale sk->sk_wq
      pointer in a closed llc socket.
      
      In commit ff7b11aa ("net: socket: set sock->sk to NULL after
      calling proto_ops::release()") Eric Biggers hinted that some protocols
      are missing a sock_orphan(), we need to perform a full audit.
      
      In net-next, I plan to clear sock->sk from sock_orphan() and
      amend Eric patch to add a warning.
      
      [1]
       BUG: KASAN: slab-use-after-free in list_empty include/linux/list.h:373 [inline]
       BUG: KASAN: slab-use-after-free in waitqueue_active include/linux/wait.h:127 [inline]
       BUG: KASAN: slab-use-after-free in sock_def_write_space_wfree net/core/sock.c:3384 [inline]
       BUG: KASAN: slab-use-after-free in sock_wfree+0x9a8/0x9d0 net/core/sock.c:2468
      Read of size 8 at addr ffff88802f4fc880 by task ksoftirqd/1/27
      
      CPU: 1 PID: 27 Comm: ksoftirqd/1 Not tainted 6.8.0-rc1-syzkaller-00049-g6098d87e #0
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
      Call Trace:
       <TASK>
        __dump_stack lib/dump_stack.c:88 [inline]
        dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106
        print_address_description mm/kasan/report.c:377 [inline]
        print_report+0xc4/0x620 mm/kasan/report.c:488
        kasan_report+0xda/0x110 mm/kasan/report.c:601
        list_empty include/linux/list.h:373 [inline]
        waitqueue_active include/linux/wait.h:127 [inline]
        sock_def_write_space_wfree net/core/sock.c:3384 [inline]
        sock_wfree+0x9a8/0x9d0 net/core/sock.c:2468
        skb_release_head_state+0xa3/0x2b0 net/core/skbuff.c:1080
        skb_release_all net/core/skbuff.c:1092 [inline]
        napi_consume_skb+0x119/0x2b0 net/core/skbuff.c:1404
        e1000_unmap_and_free_tx_resource+0x144/0x200 drivers/net/ethernet/intel/e1000/e1000_main.c:1970
        e1000_clean_tx_irq drivers/net/ethernet/intel/e1000/e1000_main.c:3860 [inline]
        e1000_clean+0x4a1/0x26e0 drivers/net/ethernet/intel/e1000/e1000_main.c:3801
        __napi_poll.constprop.0+0xb4/0x540 net/core/dev.c:6576
        napi_poll net/core/dev.c:6645 [inline]
        net_rx_action+0x956/0xe90 net/core/dev.c:6778
        __do_softirq+0x21a/0x8de kernel/softirq.c:553
        run_ksoftirqd kernel/softirq.c:921 [inline]
        run_ksoftirqd+0x31/0x60 kernel/softirq.c:913
        smpboot_thread_fn+0x660/0xa10 kernel/smpboot.c:164
        kthread+0x2c6/0x3a0 kernel/kthread.c:388
        ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
        ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
       </TASK>
      
      Allocated by task 5167:
        kasan_save_stack+0x33/0x50 mm/kasan/common.c:47
        kasan_save_track+0x14/0x30 mm/kasan/common.c:68
        unpoison_slab_object mm/kasan/common.c:314 [inline]
        __kasan_slab_alloc+0x81/0x90 mm/kasan/common.c:340
        kasan_slab_alloc include/linux/kasan.h:201 [inline]
        slab_post_alloc_hook mm/slub.c:3813 [inline]
        slab_alloc_node mm/slub.c:3860 [inline]
        kmem_cache_alloc_lru+0x142/0x6f0 mm/slub.c:3879
        alloc_inode_sb include/linux/fs.h:3019 [inline]
        sock_alloc_inode+0x25/0x1c0 net/socket.c:308
        alloc_inode+0x5d/0x220 fs/inode.c:260
        new_inode_pseudo+0x16/0x80 fs/inode.c:1005
        sock_alloc+0x40/0x270 net/socket.c:634
        __sock_create+0xbc/0x800 net/socket.c:1535
        sock_create net/socket.c:1622 [inline]
        __sys_socket_create net/socket.c:1659 [inline]
        __sys_socket+0x14c/0x260 net/socket.c:1706
        __do_sys_socket net/socket.c:1720 [inline]
        __se_sys_socket net/socket.c:1718 [inline]
        __x64_sys_socket+0x72/0xb0 net/socket.c:1718
        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
        do_syscall_64+0xd3/0x250 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      Freed by task 0:
        kasan_save_stack+0x33/0x50 mm/kasan/common.c:47
        kasan_save_track+0x14/0x30 mm/kasan/common.c:68
        kasan_save_free_info+0x3f/0x60 mm/kasan/generic.c:640
        poison_slab_object mm/kasan/common.c:241 [inline]
        __kasan_slab_free+0x121/0x1b0 mm/kasan/common.c:257
        kasan_slab_free include/linux/kasan.h:184 [inline]
        slab_free_hook mm/slub.c:2121 [inline]
        slab_free mm/slub.c:4299 [inline]
        kmem_cache_free+0x129/0x350 mm/slub.c:4363
        i_callback+0x43/0x70 fs/inode.c:249
        rcu_do_batch kernel/rcu/tree.c:2158 [inline]
        rcu_core+0x819/0x1680 kernel/rcu/tree.c:2433
        __do_softirq+0x21a/0x8de kernel/softirq.c:553
      
      Last potentially related work creation:
        kasan_save_stack+0x33/0x50 mm/kasan/common.c:47
        __kasan_record_aux_stack+0xba/0x100 mm/kasan/generic.c:586
        __call_rcu_common.constprop.0+0x9a/0x7b0 kernel/rcu/tree.c:2683
        destroy_inode+0x129/0x1b0 fs/inode.c:315
        iput_final fs/inode.c:1739 [inline]
        iput.part.0+0x560/0x7b0 fs/inode.c:1765
        iput+0x5c/0x80 fs/inode.c:1755
        dentry_unlink_inode+0x292/0x430 fs/dcache.c:400
        __dentry_kill+0x1ca/0x5f0 fs/dcache.c:603
        dput.part.0+0x4ac/0x9a0 fs/dcache.c:845
        dput+0x1f/0x30 fs/dcache.c:835
        __fput+0x3b9/0xb70 fs/file_table.c:384
        task_work_run+0x14d/0x240 kernel/task_work.c:180
        exit_task_work include/linux/task_work.h:38 [inline]
        do_exit+0xa8a/0x2ad0 kernel/exit.c:871
        do_group_exit+0xd4/0x2a0 kernel/exit.c:1020
        __do_sys_exit_group kernel/exit.c:1031 [inline]
        __se_sys_exit_group kernel/exit.c:1029 [inline]
        __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:1029
        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
        do_syscall_64+0xd3/0x250 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      The buggy address belongs to the object at ffff88802f4fc800
       which belongs to the cache sock_inode_cache of size 1408
      The buggy address is located 128 bytes inside of
       freed 1408-byte region [ffff88802f4fc800, ffff88802f4fcd80)
      
      The buggy address belongs to the physical page:
      page:ffffea0000bd3e00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2f4f8
      head:ffffea0000bd3e00 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
      anon flags: 0xfff00000000840(slab|head|node=0|zone=1|lastcpupid=0x7ff)
      page_type: 0xffffffff()
      raw: 00fff00000000840 ffff888013b06b40 0000000000000000 0000000000000001
      raw: 0000000000000000 0000000080150015 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
      page last allocated via order 3, migratetype Reclaimable, gfp_mask 0xd20d0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE), pid 4956, tgid 4956 (sshd), ts 31423924727, free_ts 0
        set_page_owner include/linux/page_owner.h:31 [inline]
        post_alloc_hook+0x2d0/0x350 mm/page_alloc.c:1533
        prep_new_page mm/page_alloc.c:1540 [inline]
        get_page_from_freelist+0xa28/0x3780 mm/page_alloc.c:3311
        __alloc_pages+0x22f/0x2440 mm/page_alloc.c:4567
        __alloc_pages_node include/linux/gfp.h:238 [inline]
        alloc_pages_node include/linux/gfp.h:261 [inline]
        alloc_slab_page mm/slub.c:2190 [inline]
        allocate_slab mm/slub.c:2354 [inline]
        new_slab+0xcc/0x3a0 mm/slub.c:2407
        ___slab_alloc+0x4af/0x19a0 mm/slub.c:3540
        __slab_alloc.constprop.0+0x56/0xa0 mm/slub.c:3625
        __slab_alloc_node mm/slub.c:3678 [inline]
        slab_alloc_node mm/slub.c:3850 [inline]
        kmem_cache_alloc_lru+0x379/0x6f0 mm/slub.c:3879
        alloc_inode_sb include/linux/fs.h:3019 [inline]
        sock_alloc_inode+0x25/0x1c0 net/socket.c:308
        alloc_inode+0x5d/0x220 fs/inode.c:260
        new_inode_pseudo+0x16/0x80 fs/inode.c:1005
        sock_alloc+0x40/0x270 net/socket.c:634
        __sock_create+0xbc/0x800 net/socket.c:1535
        sock_create net/socket.c:1622 [inline]
        __sys_socket_create net/socket.c:1659 [inline]
        __sys_socket+0x14c/0x260 net/socket.c:1706
        __do_sys_socket net/socket.c:1720 [inline]
        __se_sys_socket net/socket.c:1718 [inline]
        __x64_sys_socket+0x72/0xb0 net/socket.c:1718
        do_syscall_x64 arch/x86/entry/common.c:52 [inline]
        do_syscall_64+0xd3/0x250 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x63/0x6b
      page_owner free stack trace missing
      
      Memory state around the buggy address:
       ffff88802f4fc780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff88802f4fc800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff88802f4fc880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                         ^
       ffff88802f4fc900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff88802f4fc980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      Fixes: 43815482 ("net: sock_def_readable() and friends RCU conversion")
      Reported-and-tested-by: syzbot+32b89eaa102b372ff76d@syzkaller.appspotmail.com
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20240126165532.3396702-1-edumazet@google.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      aa2b2eb3
    • Paolo Abeni's avatar
      Merge branch 'net-stmmac-dwmac-imx-time-based-scheduling-support' · f8affba7
      Paolo Abeni authored
      Esben Haabendal says:
      
      ====================
      net: stmmac: dwmac-imx: Time Based Scheduling support
      
      This small patch series allows using TBS support of the i.MX Ethernet QOS
      controller for etf qdisc offload.
      It achieves this in a similar manner that it is done in dwmac-intel.c,
      dwmac-mediatek.c and stmmac_pci.c.
      
      Changes since v1:
      
      - Simplified for loop by starting at index 1.
      - Fixed problem with indentation.
      ====================
      
      Link: https://lore.kernel.org/r/cover.1706256158.git.esben@geanix.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      f8affba7
    • Esben Haabendal's avatar
      net: stmmac: dwmac-imx: set TSO/TBS TX queues default settings · 3b12ec8f
      Esben Haabendal authored
      TSO and TBS cannot coexist. For now we set i.MX Ethernet QOS controller to
      use the first TX queue with TSO and the rest for TBS.
      
      TX queues with TBS can support etf qdisc hw offload.
      Signed-off-by: default avatarEsben Haabendal <esben@geanix.com>
      Reviewed-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Reviewed-by: default avatarVadim Fedorenko <vadim.fedorenko@linux.dev>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      3b12ec8f
    • Esben Haabendal's avatar
      net: stmmac: do not clear TBS enable bit on link up/down · 4896bb7c
      Esben Haabendal authored
      With the dma conf being reallocated on each call to stmmac_open(), any
      information in there is lost, unless we specifically handle it.
      
      The STMMAC_TBS_EN bit is set when adding an etf qdisc, and the etf qdisc
      therefore would stop working when link was set down and then back up.
      
      Fixes: ba39b344 ("net: ethernet: stmicro: stmmac: generate stmmac dma conf before open")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarEsben Haabendal <esben@geanix.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      4896bb7c
    • Helge Deller's avatar
      ipv6: Ensure natural alignment of const ipv6 loopback and router addresses · 60365049
      Helge Deller authored
      On a parisc64 kernel I sometimes notice this kernel warning:
      Kernel unaligned access to 0x40ff8814 at ndisc_send_skb+0xc0/0x4d8
      
      The address 0x40ff8814 points to the in6addr_linklocal_allrouters
      variable and the warning simply means that some ipv6 function tries to
      read a 64-bit word directly from the not-64-bit aligned
      in6addr_linklocal_allrouters variable.
      
      Unaligned accesses are non-critical as the architecture or exception
      handlers usually will fix it up at runtime. Nevertheless it may trigger
      a performance penality for some architectures. For details read the
      "unaligned-memory-access" kernel documentation.
      
      The patch below ensures that the ipv6 loopback and router addresses will
      always be naturally aligned. This prevents the unaligned accesses for
      all architectures.
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Fixes: 034dfc5d ("ipv6: export in6addr_loopback to modules")
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://lore.kernel.org/r/ZbNuFM1bFqoH-UoY@p100Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      60365049
    • Jakub Kicinski's avatar
      selftests: net: add missing config for nftables-backed iptables · 59c93583
      Jakub Kicinski authored
      Modern OSes use iptables implementation with nf_tables as a backend,
      e.g.:
      
      $ iptables -V
      iptables v1.8.8 (nf_tables)
      
      Pablo points out that we need CONFIG_NFT_COMPAT to make that work,
      otherwise we see a lot of:
      
        Warning: Extension DNAT revision 0 not supported, missing kernel module?
      
      with DNAT being just an example here, other modules we need
      include udp, TTL, length etc.
      
      Link: https://lore.kernel.org/r/20240126201308.2903602-1-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      59c93583
    • Michal Vokáč's avatar
      net: dsa: qca8k: fix illegal usage of GPIO · c44fc98f
      Michal Vokáč authored
      When working with GPIO, its direction must be set either when the GPIO is
      requested by gpiod_get*() or later on by one of the gpiod_direction_*()
      functions. Neither of this is done here which results in undefined
      behavior on some systems.
      
      As the reset GPIO is used right after it is requested here, it makes sense
      to configure it as GPIOD_OUT_HIGH right away. With that, the following
      gpiod_set_value_cansleep(1) becomes redundant and can be safely
      removed.
      
      Fixes: a653f2f5 ("net: dsa: qca8k: introduce reset via gpio feature")
      Signed-off-by: default avatarMichal Vokáč <michal.vokac@ysoft.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/1706266175-3408-1-git-send-email-michal.vokac@ysoft.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c44fc98f
  4. 29 Jan, 2024 2 commits
    • Christophe JAILLET's avatar
      ixgbe: Fix an error handling path in ixgbe_read_iosf_sb_reg_x550() · bbc404d2
      Christophe JAILLET authored
      All error handling paths, except this one, go to 'out' where
      release_swfw_sync() is called.
      This call balances the acquire_swfw_sync() call done at the beginning of
      the function.
      
      Branch to the error handling path in order to correctly release some
      resources in case of error.
      
      Fixes: ae14a1d8 ("ixgbe: Fix IOSF SB access issues")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      bbc404d2
    • Jacob Keller's avatar
      e1000e: correct maximum frequency adjustment values · f1f6a6b1
      Jacob Keller authored
      The e1000e driver supports hardware with a variety of different clock
      speeds, and thus a variety of different increment values used for
      programming its PTP hardware clock.
      
      The values currently programmed in e1000e_ptp_init are incorrect. In
      particular, only two maximum adjustments are used: 24000000 - 1, and
      600000000 - 1. These were originally intended to be used with the 96 MHz
      clock and the 25 MHz clock.
      
      Both of these values are actually slightly too high. For the 96 MHz clock,
      the actual maximum value that can safely be programmed is 23,999,938. For
      the 25 MHz clock, the maximum value is 599,999,904.
      
      Worse, several devices use a 24 MHz clock or a 38.4 MHz clock. These parts
      are incorrectly assigned one of either the 24million or 600million values.
      For the 24 MHz clock, this is not a significant issue: its current
      increment value can support an adjustment up to 7billion in the positive
      direction. However, the 38.4 KHz clock uses an increment value which can
      only support up to 230,769,157 before it starts overflowing.
      
      To understand where these values come from, consider that frequency
      adjustments have the form of:
      
      new_incval = base_incval + (base_incval * adjustment) / (unit of adjustment)
      
      The maximum adjustment is reported in terms of parts per billion:
      new_incval = base_incval + (base_incval * adjustment) / 1 billion
      
      The largest possible adjustment is thus given by the following:
      max_incval = base_incval + (base_incval * max_adj) / 1 billion
      
      Re-arranging to solve for max_adj:
      max_adj = (max_incval - base_incval) * 1 billion / base_incval
      
      We also need to ensure that negative adjustments cannot underflow. This can
      be achieved simply by ensuring max_adj is always less than 1 billion.
      
      Introduce new macros in e1000.h codifying the maximum adjustment in PPB for
      each frequency given its associated increment values. Also clarify where
      these values come from by commenting about the above equations.
      
      Replace the switch statement in e1000e_ptp_init with one which mirrors the
      increment value switch statement from e1000e_get_base_timinica. For each
      device, assign the appropriate maximum adjustment based on its frequency.
      Some parts can have one of two frequency modes as determined by
      E1000_TSYNCRXCTL_SYSCFI.
      
      Since the new flow directly matches the assignments in
      e1000e_get_base_timinca, and uses well defined macro names, it is much
      easier to verify that the resulting maximum adjustments are correct. It
      also avoids difficult to parse construction such as the "hw->mac.type <
      e1000_phc_lpt", and the use of fallthrough which was especially confusing
      when combined with a conditional block.
      
      Note that I believe the current increment value configuration used for
      24MHz clocks is sub-par, as it leaves at least 3 extra bits available in
      the INCVALUE register. However, fixing that requires more careful review of
      the clock rate and associated values.
      Reported-by: default avatarTrey Harrison <harrisondigitalmedia@gmail.com>
      Fixes: 68fe1d5d ("e1000e: Add Support for 38.4MHZ frequency")
      Fixes: d89777bf ("e1000e: add support for IEEE-1588 PTP")
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      f1f6a6b1