1. 11 Jun, 2024 12 commits
  2. 10 Jun, 2024 18 commits
    • David S. Miller's avatar
      Merge branch 'fix-changing-dsa-conduit' · 2ba6d157
      David S. Miller authored
      Marek Behún says:
      
      ====================
      Fix changing DSA conduit
      
      This series fixes an issue in the DSA code related to host interface UC
      address installed into port FDB and port conduit address database when
      live-changing port conduit.
      
      The first patch refactores/deduplicates the installation/uninstallation
      of the interface's MAC address and the second patch fixes the issue.
      
      Cover letter for v1 and v2:
        https://patchwork.kernel.org/project/netdevbpf/cover/20240429163627.16031-1-kabel@kernel.org/
        https://patchwork.kernel.org/project/netdevbpf/cover/20240502122922.28139-1-kabel@kernel.org/
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2ba6d157
    • Marek Behún's avatar
      net: dsa: update the unicast MAC address when changing conduit · eef8e906
      Marek Behún authored
      When changing DSA user interface conduit while the user interface is up,
      DSA exhibits different behavior in comparison to when the interface is
      down. This different behavior concerns the primary unicast MAC address
      stored in the port standalone FDB and in the conduit device UC database.
      
      If we put a switch port down while changing the conduit with
        ip link set sw0p0 down
        ip link set sw0p0 type dsa conduit conduit1
        ip link set sw0p0 up
      we delete the address in dsa_user_close() and install the (possibly
      different) address in dsa_user_open().
      
      But when changing the conduit on the fly, the old address is not
      deleted and the new one is not installed.
      
      Since we explicitly want to support live-changing the conduit, uninstall
      the old address before calling dsa_port_assign_conduit() and install the
      (possibly different) new address after the call.
      
      Because conduit change might also trigger address change (the user
      interface is supposed to inherit the conduit interface MAC address if no
      address is defined in hardware (dp->mac is a zero address)), move the
      eth_hw_addr_inherit() call from dsa_user_change_conduit() to
      dsa_port_change_conduit(), just before installing the new address.
      
      Although this is in theory a flaw in DSA core, it needs not be
      backported, since there is currently no DSA driver that can be affected
      by this. The only DSA driver that supports changing conduit is felix,
      and, as explained by Vladimir Oltean [1]:
      
        There are 2 reasons why with felix the bug does not manifest itself.
      
        First is because both the 'ocelot' and the alternate 'ocelot-8021q'
        tagging protocols have the 'promisc_on_conduit = true' flag. So the
        unicast address doesn't have to be in the conduit's RX filter -
        neither the old or the new conduit.
      
        Second, dsa_user_host_uc_install() theoretically leaves behind host
        FDB entries installed towards the wrong (old) CPU port. But in
        felix_fdb_add(), we treat any FDB entry requested towards any CPU port
        as if it was a multicast FDB entry programmed towards _all_ CPU ports.
        For that reason, it is installed towards the port mask of the PGID_CPU
        port group ID:
      
      	if (dsa_port_is_cpu(dp))
      		port = PGID_CPU;
      
      Therefore no Fixes tag for this change.
      
      [1] https://lore.kernel.org/netdev/20240507201827.47suw4fwcjrbungy@skbuf/Signed-off-by: default avatarMarek Behún <kabel@kernel.org>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Tested-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eef8e906
    • Marek Behún's avatar
      net: dsa: deduplicate code adding / deleting the port address to fdb · 77f75412
      Marek Behún authored
      The sequence
        if (dsa_switch_supports_uc_filtering(ds))
          dsa_port_standalone_host_fdb_add(dp, addr, 0);
        if (!ether_addr_equal(addr, conduit->dev_addr))
          dev_uc_add(conduit, addr);
      is executed both in dsa_user_open() and dsa_user_set_mac_addr().
      
      Its reverse is executed both in dsa_user_close() and
      dsa_user_set_mac_addr().
      
      Refactor these sequences into new functions dsa_user_host_uc_install()
      and dsa_user_host_uc_uninstall().
      Signed-off-by: default avatarMarek Behún <kabel@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      77f75412
    • David S. Miller's avatar
      Merge branch 'rtnetlink-rtnl_lock' · 395059c5
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      rtnetlink: move rtnl_lock handling out of af_netlink
      
      With the changes done in commit 5b4b62a1 ("rtnetlink: make
      the "split" NLM_DONE handling generic") we can also move the
      rtnl locking out of af_netlink.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      395059c5
    • Jakub Kicinski's avatar
      net: netlink: remove the cb_mutex "injection" from netlink core · 5fbf57a9
      Jakub Kicinski authored
      Back in 2007, in commit af65bdfc ("[NETLINK]: Switch cb_lock spinlock
      to mutex and allow to override it") netlink core was extended to allow
      subsystems to replace the dump mutex lock with its own lock.
      
      The mechanism was used by rtnetlink to take rtnl_lock but it isn't
      sufficiently flexible for other users. Over the 17 years since
      it was added no other user appeared. Since rtnetlink needs conditional
      locking now, and doesn't use it either, axe this feature complete.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5fbf57a9
    • Jakub Kicinski's avatar
      rtnetlink: move rtnl_lock handling out of af_netlink · 5380d64f
      Jakub Kicinski authored
      Now that we have an intermediate layer of code for handling
      rtnl-level netlink dump quirks, we can move the rtnl_lock
      taking there.
      
      For dump handlers with RTNL_FLAG_DUMP_SPLIT_NLM_DONE we can
      avoid taking rtnl_lock just to generate NLM_DONE, once again.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5380d64f
    • Andy Shevchenko's avatar
      net: dsa: hellcreek: Replace kernel.h with what is used · c917b26e
      Andy Shevchenko authored
      kernel.h is included solely for some other existing headers.
      Include them directly and get rid of kernel.h.
      
      While at it, sort headers alphabetically for easier maintenance.
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Reviewed-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c917b26e
    • David S. Miller's avatar
      Merge branch 'tcp-up-pin-tw-timer' · a9522664
      David S. Miller authored
      Florian Westphal says:
      
      ====================
      net: tcp: un-pin tw timer
      
      Changes since previous iteration:
       - Patch 1: update a comment, I copied Erics v7 RvB tag.
       - Patch 2: move bh off/on into hashdance_schedule and get rid of
         comment mentioning pinned tw timer.
         I did not copy Erics RvB tag over from v7 because of the change.
       - Patch 3 is unchanged, so I kept Erics RvB tag.
      
      This is v8 of the series where the tw_timer is un-pinned to get rid of
      interferences in isolated CPUs setups.
      
      First patch makes necessary preparations, existing code relies on
      TIMER_PINNED to avoid races.
      
      Second patch un-pins the TW timer. Could be folded into the first one,
      but it might help wrt. bisection.
      
      Third patch is a minor cleanup to move a helper from .h to the only
      remaining compilation unit.
      
      Tested with iperf3 and stress-ng socket mode.
      ====================
      Reviewed-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9522664
    • Florian Westphal's avatar
      tcp: move inet_twsk_schedule helper out of header · f81d0dd2
      Florian Westphal authored
      Its no longer used outside inet_timewait_sock.c, so move it there.
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f81d0dd2
    • Florian Westphal's avatar
      net: tcp: un-pin the tw_timer · c75ad7c7
      Florian Westphal authored
      After previous patch, even if timer fires immediately on another CPU,
      context that schedules the timer now holds the ehash spinlock, so timer
      cannot reap tw socket until ehash lock is released.
      
      BH disable is moved into hashdance_schedule.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c75ad7c7
    • Valentin Schneider's avatar
      net: tcp/dccp: prepare for tw_timer un-pinning · b334b924
      Valentin Schneider authored
      The TCP timewait timer is proving to be problematic for setups where
      scheduler CPU isolation is achieved at runtime via cpusets (as opposed to
      statically via isolcpus=domains).
      
      What happens there is a CPU goes through tcp_time_wait(), arming the
      time_wait timer, then gets isolated. TCP_TIMEWAIT_LEN later, the timer
      fires, causing interference for the now-isolated CPU. This is conceptually
      similar to the issue described in commit e02b9312 ("workqueue: Unbind
      kworkers before sending them to exit()")
      
      Move inet_twsk_schedule() to within inet_twsk_hashdance(), with the ehash
      lock held. Expand the lock's critical section from inet_twsk_kill() to
      inet_twsk_deschedule_put(), serializing the scheduling vs descheduling of
      the timer. IOW, this prevents the following race:
      
      			     tcp_time_wait()
      			       inet_twsk_hashdance()
        inet_twsk_deschedule_put()
          del_timer_sync()
      			       inet_twsk_schedule()
      
      Thanks to Paolo Abeni for suggesting to leverage the ehash lock.
      
      This also restores a comment from commit ec94c269 ("tcp/dccp: avoid
      one atomic operation for timewait hashdance") as inet_twsk_hashdance() had
      a "Step 1" and "Step 3" comment, but the "Step 2" had gone missing.
      
      inet_twsk_deschedule_put() now acquires the ehash spinlock to synchronize
      with inet_twsk_hashdance_schedule().
      
      To ease possible regression search, actual un-pin is done in next patch.
      
      Link: https://lore.kernel.org/all/ZPhpfMjSiHVjQkTk@localhost.localdomain/Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarValentin Schneider <vschneid@redhat.com>
      Co-developed-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b334b924
    • David S. Miller's avatar
      Merge branch 'mlxsw-acl-fixes' · 8d466c8f
      David S. Miller authored
      Petr Machata says:
      
      ====================
      mlxsw: ACL fixes
      
      Ido Schimmel writes:
      
      Patches #1-#3 fix various spelling mistakes I noticed while working on
      the code base.
      
      Patch #4 fixes a general protection fault by bailing out when the error
      occurs and warning.
      
      Patch #5 fixes the warning.
      
      Patch #6 fixes ACL scale regression and firmware errors.
      
      See the commit messages for more info.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d466c8f
    • Ido Schimmel's avatar
      mlxsw: spectrum_acl: Fix ACL scale regression and firmware errors · 75d8d7a6
      Ido Schimmel authored
      ACLs that reside in the algorithmic TCAM (A-TCAM) in Spectrum-2 and
      newer ASICs can share the same mask if their masks only differ in up to
      8 consecutive bits. For example, consider the following filters:
      
       # tc filter add dev swp1 ingress pref 1 proto ip flower dst_ip 192.0.2.0/24 action drop
       # tc filter add dev swp1 ingress pref 1 proto ip flower dst_ip 198.51.100.128/25 action drop
      
      The second filter can use the same mask as the first (dst_ip/24) with a
      delta of 1 bit.
      
      However, the above only works because the two filters have different
      values in the common unmasked part (dst_ip/24). When entries have the
      same value in the common unmasked part they create undesired collisions
      in the device since many entries now have the same key. This leads to
      firmware errors such as [1] and to a reduced scale.
      
      Fix by adjusting the hash table key to only include the value in the
      common unmasked part. That is, without including the delta bits. That
      way the driver will detect the collision during filter insertion and
      spill the filter into the circuit TCAM (C-TCAM).
      
      Add a test case that fails without the fix and adjust existing cases
      that check C-TCAM spillage according to the above limitation.
      
      [1]
      mlxsw_spectrum2 0000:06:00.0: EMAD reg access failed (tid=3379b18a00003394,reg_id=3027(ptce3),type=write,status=8(resource not available))
      
      Fixes: c22291f7 ("mlxsw: spectrum: acl: Implement delta for ERP")
      Reported-by: default avatarAlexander Zubkov <green@qrator.net>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Tested-by: default avatarAlexander Zubkov <green@qrator.net>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      75d8d7a6
    • Ido Schimmel's avatar
      mlxsw: spectrum_acl_erp: Fix object nesting warning · 97d833ce
      Ido Schimmel authored
      ACLs in Spectrum-2 and newer ASICs can reside in the algorithmic TCAM
      (A-TCAM) or in the ordinary circuit TCAM (C-TCAM). The former can
      contain more ACLs (i.e., tc filters), but the number of masks in each
      region (i.e., tc chain) is limited.
      
      In order to mitigate the effects of the above limitation, the device
      allows filters to share a single mask if their masks only differ in up
      to 8 consecutive bits. For example, dst_ip/25 can be represented using
      dst_ip/24 with a delta of 1 bit. The C-TCAM does not have a limit on the
      number of masks being used (and therefore does not support mask
      aggregation), but can contain a limited number of filters.
      
      The driver uses the "objagg" library to perform the mask aggregation by
      passing it objects that consist of the filter's mask and whether the
      filter is to be inserted into the A-TCAM or the C-TCAM since filters in
      different TCAMs cannot share a mask.
      
      The set of created objects is dependent on the insertion order of the
      filters and is not necessarily optimal. Therefore, the driver will
      periodically ask the library to compute a more optimal set ("hints") by
      looking at all the existing objects.
      
      When the library asks the driver whether two objects can be aggregated
      the driver only compares the provided masks and ignores the A-TCAM /
      C-TCAM indication. This is the right thing to do since the goal is to
      move as many filters as possible to the A-TCAM. The driver also forbids
      two identical masks from being aggregated since this can only happen if
      one was intentionally put in the C-TCAM to avoid a conflict in the
      A-TCAM.
      
      The above can result in the following set of hints:
      
      H1: {mask X, A-TCAM} -> H2: {mask Y, A-TCAM} // X is Y + delta
      H3: {mask Y, C-TCAM} -> H4: {mask Z, A-TCAM} // Y is Z + delta
      
      After getting the hints from the library the driver will start migrating
      filters from one region to another while consulting the computed hints
      and instructing the device to perform a lookup in both regions during
      the transition.
      
      Assuming a filter with mask X is being migrated into the A-TCAM in the
      new region, the hints lookup will return H1. Since H2 is the parent of
      H1, the library will try to find the object associated with it and
      create it if necessary in which case another hints lookup (recursive)
      will be performed. This hints lookup for {mask Y, A-TCAM} will either
      return H2 or H3 since the driver passes the library an object comparison
      function that ignores the A-TCAM / C-TCAM indication.
      
      This can eventually lead to nested objects which are not supported by
      the library [1].
      
      Fix by removing the object comparison function from both the driver and
      the library as the driver was the only user. That way the lookup will
      only return exact matches.
      
      I do not have a reliable reproducer that can reproduce the issue in a
      timely manner, but before the fix the issue would reproduce in several
      minutes and with the fix it does not reproduce in over an hour.
      
      Note that the current usefulness of the hints is limited because they
      include the C-TCAM indication and represent aggregation that cannot
      actually happen. This will be addressed in net-next.
      
      [1]
      WARNING: CPU: 0 PID: 153 at lib/objagg.c:170 objagg_obj_parent_assign+0xb5/0xd0
      Modules linked in:
      CPU: 0 PID: 153 Comm: kworker/0:18 Not tainted 6.9.0-rc6-custom-g70fbc2c1c38b #42
      Hardware name: Mellanox Technologies Ltd. MSN3700C/VMOD0008, BIOS 5.11 10/10/2018
      Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
      RIP: 0010:objagg_obj_parent_assign+0xb5/0xd0
      [...]
      Call Trace:
       <TASK>
       __objagg_obj_get+0x2bb/0x580
       objagg_obj_get+0xe/0x80
       mlxsw_sp_acl_erp_mask_get+0xb5/0xf0
       mlxsw_sp_acl_atcam_entry_add+0xe8/0x3c0
       mlxsw_sp_acl_tcam_entry_create+0x5e/0xa0
       mlxsw_sp_acl_tcam_vchunk_migrate_one+0x16b/0x270
       mlxsw_sp_acl_tcam_vregion_rehash_work+0xbe/0x510
       process_one_work+0x151/0x370
      
      Fixes: 9069a381 ("lib: objagg: implement optimization hints assembly and use hints for object creation")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Tested-by: default avatarAlexander Zubkov <green@qrator.net>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      97d833ce
    • Ido Schimmel's avatar
      lib: objagg: Fix general protection fault · b4a3a89f
      Ido Schimmel authored
      The library supports aggregation of objects into other objects only if
      the parent object does not have a parent itself. That is, nesting is not
      supported.
      
      Aggregation happens in two cases: Without and with hints, where hints
      are a pre-computed recommendation on how to aggregate the provided
      objects.
      
      Nesting is not possible in the first case due to a check that prevents
      it, but in the second case there is no check because the assumption is
      that nesting cannot happen when creating objects based on hints. The
      violation of this assumption leads to various warnings and eventually to
      a general protection fault [1].
      
      Before fixing the root cause, error out when nesting happens and warn.
      
      [1]
      general protection fault, probably for non-canonical address 0xdead000000000d90: 0000 [#1] PREEMPT SMP PTI
      CPU: 1 PID: 1083 Comm: kworker/1:9 Tainted: G        W          6.9.0-rc6-custom-gd9b4f1cca7fb #7
      Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019
      Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
      RIP: 0010:mlxsw_sp_acl_erp_bf_insert+0x25/0x80
      [...]
      Call Trace:
       <TASK>
       mlxsw_sp_acl_atcam_entry_add+0x256/0x3c0
       mlxsw_sp_acl_tcam_entry_create+0x5e/0xa0
       mlxsw_sp_acl_tcam_vchunk_migrate_one+0x16b/0x270
       mlxsw_sp_acl_tcam_vregion_rehash_work+0xbe/0x510
       process_one_work+0x151/0x370
       worker_thread+0x2cb/0x3e0
       kthread+0xd0/0x100
       ret_from_fork+0x34/0x50
       ret_from_fork_asm+0x1a/0x30
       </TASK>
      
      Fixes: 9069a381 ("lib: objagg: implement optimization hints assembly and use hints for object creation")
      Reported-by: default avatarAlexander Zubkov <green@qrator.net>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Tested-by: default avatarAlexander Zubkov <green@qrator.net>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4a3a89f
    • Ido Schimmel's avatar
      mlxsw: spectrum_acl_atcam: Fix wrong comment · 06fcdf24
      Ido Schimmel authored
      The key is encoded, not encrypted.
      
      Fixes: c22291f7 ("mlxsw: spectrum: acl: Implement delta for ERP")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Tested-by: default avatarAlexander Zubkov <green@qrator.net>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      06fcdf24
    • Ido Schimmel's avatar
      lib: test_objagg: Fix spelling · 2aad28ec
      Ido Schimmel authored
      Fixes: 0a020d41 ("lib: introduce initial implementation of object aggregation manager")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Tested-by: default avatarAlexander Zubkov <green@qrator.net>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2aad28ec
    • Ido Schimmel's avatar
      lib: objagg: Fix spelling · c1e156ae
      Ido Schimmel authored
      Fixes: 0a020d41 ("lib: introduce initial implementation of object aggregation manager")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Tested-by: default avatarAlexander Zubkov <green@qrator.net>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1e156ae
  3. 09 Jun, 2024 3 commits
  4. 07 Jun, 2024 2 commits
  5. 06 Jun, 2024 5 commits