1. 10 Jun, 2024 7 commits
    • David S. Miller's avatar
      Merge branch 'mlxsw-acl-fixes' · 8d466c8f
      David S. Miller authored
      Petr Machata says:
      
      ====================
      mlxsw: ACL fixes
      
      Ido Schimmel writes:
      
      Patches #1-#3 fix various spelling mistakes I noticed while working on
      the code base.
      
      Patch #4 fixes a general protection fault by bailing out when the error
      occurs and warning.
      
      Patch #5 fixes the warning.
      
      Patch #6 fixes ACL scale regression and firmware errors.
      
      See the commit messages for more info.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d466c8f
    • Ido Schimmel's avatar
      mlxsw: spectrum_acl: Fix ACL scale regression and firmware errors · 75d8d7a6
      Ido Schimmel authored
      ACLs that reside in the algorithmic TCAM (A-TCAM) in Spectrum-2 and
      newer ASICs can share the same mask if their masks only differ in up to
      8 consecutive bits. For example, consider the following filters:
      
       # tc filter add dev swp1 ingress pref 1 proto ip flower dst_ip 192.0.2.0/24 action drop
       # tc filter add dev swp1 ingress pref 1 proto ip flower dst_ip 198.51.100.128/25 action drop
      
      The second filter can use the same mask as the first (dst_ip/24) with a
      delta of 1 bit.
      
      However, the above only works because the two filters have different
      values in the common unmasked part (dst_ip/24). When entries have the
      same value in the common unmasked part they create undesired collisions
      in the device since many entries now have the same key. This leads to
      firmware errors such as [1] and to a reduced scale.
      
      Fix by adjusting the hash table key to only include the value in the
      common unmasked part. That is, without including the delta bits. That
      way the driver will detect the collision during filter insertion and
      spill the filter into the circuit TCAM (C-TCAM).
      
      Add a test case that fails without the fix and adjust existing cases
      that check C-TCAM spillage according to the above limitation.
      
      [1]
      mlxsw_spectrum2 0000:06:00.0: EMAD reg access failed (tid=3379b18a00003394,reg_id=3027(ptce3),type=write,status=8(resource not available))
      
      Fixes: c22291f7 ("mlxsw: spectrum: acl: Implement delta for ERP")
      Reported-by: default avatarAlexander Zubkov <green@qrator.net>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Tested-by: default avatarAlexander Zubkov <green@qrator.net>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      75d8d7a6
    • Ido Schimmel's avatar
      mlxsw: spectrum_acl_erp: Fix object nesting warning · 97d833ce
      Ido Schimmel authored
      ACLs in Spectrum-2 and newer ASICs can reside in the algorithmic TCAM
      (A-TCAM) or in the ordinary circuit TCAM (C-TCAM). The former can
      contain more ACLs (i.e., tc filters), but the number of masks in each
      region (i.e., tc chain) is limited.
      
      In order to mitigate the effects of the above limitation, the device
      allows filters to share a single mask if their masks only differ in up
      to 8 consecutive bits. For example, dst_ip/25 can be represented using
      dst_ip/24 with a delta of 1 bit. The C-TCAM does not have a limit on the
      number of masks being used (and therefore does not support mask
      aggregation), but can contain a limited number of filters.
      
      The driver uses the "objagg" library to perform the mask aggregation by
      passing it objects that consist of the filter's mask and whether the
      filter is to be inserted into the A-TCAM or the C-TCAM since filters in
      different TCAMs cannot share a mask.
      
      The set of created objects is dependent on the insertion order of the
      filters and is not necessarily optimal. Therefore, the driver will
      periodically ask the library to compute a more optimal set ("hints") by
      looking at all the existing objects.
      
      When the library asks the driver whether two objects can be aggregated
      the driver only compares the provided masks and ignores the A-TCAM /
      C-TCAM indication. This is the right thing to do since the goal is to
      move as many filters as possible to the A-TCAM. The driver also forbids
      two identical masks from being aggregated since this can only happen if
      one was intentionally put in the C-TCAM to avoid a conflict in the
      A-TCAM.
      
      The above can result in the following set of hints:
      
      H1: {mask X, A-TCAM} -> H2: {mask Y, A-TCAM} // X is Y + delta
      H3: {mask Y, C-TCAM} -> H4: {mask Z, A-TCAM} // Y is Z + delta
      
      After getting the hints from the library the driver will start migrating
      filters from one region to another while consulting the computed hints
      and instructing the device to perform a lookup in both regions during
      the transition.
      
      Assuming a filter with mask X is being migrated into the A-TCAM in the
      new region, the hints lookup will return H1. Since H2 is the parent of
      H1, the library will try to find the object associated with it and
      create it if necessary in which case another hints lookup (recursive)
      will be performed. This hints lookup for {mask Y, A-TCAM} will either
      return H2 or H3 since the driver passes the library an object comparison
      function that ignores the A-TCAM / C-TCAM indication.
      
      This can eventually lead to nested objects which are not supported by
      the library [1].
      
      Fix by removing the object comparison function from both the driver and
      the library as the driver was the only user. That way the lookup will
      only return exact matches.
      
      I do not have a reliable reproducer that can reproduce the issue in a
      timely manner, but before the fix the issue would reproduce in several
      minutes and with the fix it does not reproduce in over an hour.
      
      Note that the current usefulness of the hints is limited because they
      include the C-TCAM indication and represent aggregation that cannot
      actually happen. This will be addressed in net-next.
      
      [1]
      WARNING: CPU: 0 PID: 153 at lib/objagg.c:170 objagg_obj_parent_assign+0xb5/0xd0
      Modules linked in:
      CPU: 0 PID: 153 Comm: kworker/0:18 Not tainted 6.9.0-rc6-custom-g70fbc2c1c38b #42
      Hardware name: Mellanox Technologies Ltd. MSN3700C/VMOD0008, BIOS 5.11 10/10/2018
      Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
      RIP: 0010:objagg_obj_parent_assign+0xb5/0xd0
      [...]
      Call Trace:
       <TASK>
       __objagg_obj_get+0x2bb/0x580
       objagg_obj_get+0xe/0x80
       mlxsw_sp_acl_erp_mask_get+0xb5/0xf0
       mlxsw_sp_acl_atcam_entry_add+0xe8/0x3c0
       mlxsw_sp_acl_tcam_entry_create+0x5e/0xa0
       mlxsw_sp_acl_tcam_vchunk_migrate_one+0x16b/0x270
       mlxsw_sp_acl_tcam_vregion_rehash_work+0xbe/0x510
       process_one_work+0x151/0x370
      
      Fixes: 9069a381 ("lib: objagg: implement optimization hints assembly and use hints for object creation")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Tested-by: default avatarAlexander Zubkov <green@qrator.net>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      97d833ce
    • Ido Schimmel's avatar
      lib: objagg: Fix general protection fault · b4a3a89f
      Ido Schimmel authored
      The library supports aggregation of objects into other objects only if
      the parent object does not have a parent itself. That is, nesting is not
      supported.
      
      Aggregation happens in two cases: Without and with hints, where hints
      are a pre-computed recommendation on how to aggregate the provided
      objects.
      
      Nesting is not possible in the first case due to a check that prevents
      it, but in the second case there is no check because the assumption is
      that nesting cannot happen when creating objects based on hints. The
      violation of this assumption leads to various warnings and eventually to
      a general protection fault [1].
      
      Before fixing the root cause, error out when nesting happens and warn.
      
      [1]
      general protection fault, probably for non-canonical address 0xdead000000000d90: 0000 [#1] PREEMPT SMP PTI
      CPU: 1 PID: 1083 Comm: kworker/1:9 Tainted: G        W          6.9.0-rc6-custom-gd9b4f1cca7fb #7
      Hardware name: Mellanox Technologies Ltd. MSN3700/VMOD0005, BIOS 5.11 01/06/2019
      Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
      RIP: 0010:mlxsw_sp_acl_erp_bf_insert+0x25/0x80
      [...]
      Call Trace:
       <TASK>
       mlxsw_sp_acl_atcam_entry_add+0x256/0x3c0
       mlxsw_sp_acl_tcam_entry_create+0x5e/0xa0
       mlxsw_sp_acl_tcam_vchunk_migrate_one+0x16b/0x270
       mlxsw_sp_acl_tcam_vregion_rehash_work+0xbe/0x510
       process_one_work+0x151/0x370
       worker_thread+0x2cb/0x3e0
       kthread+0xd0/0x100
       ret_from_fork+0x34/0x50
       ret_from_fork_asm+0x1a/0x30
       </TASK>
      
      Fixes: 9069a381 ("lib: objagg: implement optimization hints assembly and use hints for object creation")
      Reported-by: default avatarAlexander Zubkov <green@qrator.net>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Tested-by: default avatarAlexander Zubkov <green@qrator.net>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4a3a89f
    • Ido Schimmel's avatar
      mlxsw: spectrum_acl_atcam: Fix wrong comment · 06fcdf24
      Ido Schimmel authored
      The key is encoded, not encrypted.
      
      Fixes: c22291f7 ("mlxsw: spectrum: acl: Implement delta for ERP")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Tested-by: default avatarAlexander Zubkov <green@qrator.net>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      06fcdf24
    • Ido Schimmel's avatar
      lib: test_objagg: Fix spelling · 2aad28ec
      Ido Schimmel authored
      Fixes: 0a020d41 ("lib: introduce initial implementation of object aggregation manager")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Tested-by: default avatarAlexander Zubkov <green@qrator.net>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2aad28ec
    • Ido Schimmel's avatar
      lib: objagg: Fix spelling · c1e156ae
      Ido Schimmel authored
      Fixes: 0a020d41 ("lib: introduce initial implementation of object aggregation manager")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Tested-by: default avatarAlexander Zubkov <green@qrator.net>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1e156ae
  2. 09 Jun, 2024 3 commits
  3. 07 Jun, 2024 2 commits
  4. 06 Jun, 2024 28 commits